Hangbin Liu says:
====================
This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.
This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.
The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.
The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).
At last, Thanks a lot for Toke's help on this patch set.
v6:
a) print runtime libbpf version in ip -V and tc -V
v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.
v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.
v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
libbpf-devel-0.1.0-1.fc33
v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.
Here are the test results with patched iproute2:
== Show libbpf version
$ ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
$ tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0
== setup env
$ clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
$ clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
$ rm -rf /sys/fs/bpf/xdp/globals
$ /root/iproute2/ip/ip link add type veth
$ /root/iproute2/ip/ip link set veth0 up
$ /root/iproute2/ip/ip link set veth1 up
== Load objs
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 4 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
4: xdp name cls_aaa tag 3056d2382e53f27c gpl
loaded_at 2020-10-22T08:04:21-0400 uid 0
xlated 80B jited 71B memlock 4096B
btf_id 5
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 8 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc map_inner map_outer
$ bpftool map show
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
2: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
3: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
8: xdp name imain tag 4420e72b2a601ed7 gpl
loaded_at 2020-10-22T08:04:23-0400 uid 0
xlated 336B jited 193B memlock 4096B map_ids 3
btf_id 10
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 12 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh
/sys/fs/bpf/xdp/globals:
jmp_tc map_inner map_outer
$ bpftool map show
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
2: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
3: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
4: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
12: xdp name imain tag 9cbab549c3af3eab gpl
loaded_at 2020-10-22T08:04:25-0400 uid 0
xlated 224B jited 139B memlock 4096B map_ids 4
btf_id 15
$ /root/iproute2/ip/ip link set veth0 xdp off
== Load objs again to make sure maps could be reused
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 16 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh
/sys/fs/bpf/xdp/globals:
jmp_tc map_inner map_outer
$ bpftool map show
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
2: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
3: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
4: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
16: xdp name cls_aaa tag 3056d2382e53f27c gpl
loaded_at 2020-10-22T08:04:27-0400 uid 0
xlated 80B jited 71B memlock 4096B
btf_id 20
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 20 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh
/sys/fs/bpf/xdp/globals:
jmp_tc map_inner map_outer
$ bpftool map show [236/4518]
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
2: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
3: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
4: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
20: xdp name imain tag 4420e72b2a601ed7 gpl
loaded_at 2020-10-22T08:04:29-0400 uid 0
xlated 336B jited 193B memlock 4096B map_ids 3
btf_id 25
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 24 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh
/sys/fs/bpf/xdp/globals:
jmp_tc map_inner map_outer
$ bpftool map show
1: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
2: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
3: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
4: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
24: xdp name imain tag 9cbab549c3af3eab gpl
loaded_at 2020-10-22T08:04:31-0400 uid 0
xlated 224B jited 139B memlock 4096B map_ids 4
btf_id 30
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
== Testing if we can load new-style objects (using xdp-filter as an example)
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 28 tag e29eeda1489a6520 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet filter_ipv4 filter_ipv6 filter_ports xdp_stats_map
$ bpftool map show
5: percpu_array name xdp_stats_map flags 0x0
key 4B value 16B max_entries 5 memlock 4096B
btf_id 35
6: percpu_array name filter_ports flags 0x0
key 4B value 8B max_entries 65536 memlock 1576960B
btf_id 35
7: percpu_hash name filter_ipv4 flags 0x0
key 4B value 8B max_entries 10000 memlock 1064960B
btf_id 35
8: percpu_hash name filter_ipv6 flags 0x0
key 16B value 8B max_entries 10000 memlock 1142784B
btf_id 35
9: percpu_hash name filter_ethernet flags 0x0
key 6B value 8B max_entries 10000 memlock 1064960B
btf_id 35
$ bpftool prog show
28: xdp name xdpfilt_alw_all tag e29eeda1489a6520 gpl
loaded_at 2020-10-22T08:04:33-0400 uid 0
xlated 2408B jited 1405B memlock 4096B map_ids 9,5,7,8,6
btf_id 35
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet filter_ipv4 filter_ipv6 filter_ports xdp_stats_map
$ bpftool map show
5: percpu_array name xdp_stats_map flags 0x0
key 4B value 16B max_entries 5 memlock 4096B
btf_id 35
6: percpu_array name filter_ports flags 0x0
key 4B value 8B max_entries 65536 memlock 1576960B
btf_id 35
7: percpu_hash name filter_ipv4 flags 0x0
key 4B value 8B max_entries 10000 memlock 1064960B
btf_id 35
8: percpu_hash name filter_ipv6 flags 0x0
key 16B value 8B max_entries 10000 memlock 1142784B
btf_id 35
9: percpu_hash name filter_ethernet flags 0x0
key 6B value 8B max_entries 10000 memlock 1064960B
btf_id 35
$ bpftool prog show
32: xdp name xdpfilt_alw_ip tag 2f2b9dbfb786a5a2 gpl
loaded_at 2020-10-22T08:04:35-0400 uid 0
xlated 1336B jited 778B memlock 4096B map_ids 7,8,5
btf_id 40
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 36 tag 18c1bb25084030bc jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet filter_ipv4 filter_ipv6 filter_ports xdp_stats_map
$ bpftool map show
5: percpu_array name xdp_stats_map flags 0x0
key 4B value 16B max_entries 5 memlock 4096B
btf_id 35
6: percpu_array name filter_ports flags 0x0
key 4B value 8B max_entries 65536 memlock 1576960B
btf_id 35
7: percpu_hash name filter_ipv4 flags 0x0
key 4B value 8B max_entries 10000 memlock 1064960B
btf_id 35
8: percpu_hash name filter_ipv6 flags 0x0
key 16B value 8B max_entries 10000 memlock 1142784B
btf_id 35
9: percpu_hash name filter_ethernet flags 0x0
key 6B value 8B max_entries 10000 memlock 1064960B
btf_id 35
$ bpftool prog show
36: xdp name xdpfilt_alw_tcp tag 18c1bb25084030bc gpl
loaded_at 2020-10-22T08:04:37-0400 uid 0
xlated 1128B jited 690B memlock 4096B map_ids 6,5
btf_id 45
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals
== Load new btf defined maps
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 40 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
10: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
40: xdp name cls_aaa tag 3056d2382e53f27c gpl
loaded_at 2020-10-22T08:04:39-0400 uid 0
xlated 80B jited 71B memlock 4096B
btf_id 50
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 44 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc map_outer
$ bpftool map show
10: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
11: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
13: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
44: xdp name imain tag 4420e72b2a601ed7 gpl
loaded_at 2020-10-22T08:04:41-0400 uid 0
xlated 336B jited 193B memlock 4096B map_ids 13
btf_id 55
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 48 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc map_outer map_sh
$ bpftool map show
10: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
11: array name map_inner flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
13: array_of_maps name map_outer flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
14: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
48: xdp name imain tag 9cbab549c3af3eab gpl
loaded_at 2020-10-22T08:04:43-0400 uid 0
xlated 224B jited 139B memlock 4096B map_ids 14
btf_id 60
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals
== Test load objs by tc
$ /root/iproute2/tc/tc qdisc add dev veth0 ingress
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
$ /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
$ ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc
/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex jmp_tc map_sh
/sys/fs/bpf/xdp/globals:
jmp_tc
$ bpftool map show
15: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
owner_prog_type sched_cls owner jited
16: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
owner_prog_type sched_cls owner jited
17: prog_array name jmp_ex flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
owner_prog_type sched_cls owner jited
18: prog_array name jmp_tc flags 0x0
key 4B value 4B max_entries 2 memlock 4096B
owner_prog_type sched_cls owner jited
19: array name map_sh flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
$ bpftool prog show
52: sched_cls name cls_loop tag 3e98a40b04099d36 gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 168B jited 133B memlock 4096B map_ids 15
btf_id 65
56: sched_cls name cls_entry tag 0fbb4d9310a6ee26 gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 144B jited 121B memlock 4096B map_ids 16
btf_id 70
60: sched_cls name cls_case1 tag e06a3bd62293d65d gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 328B jited 216B memlock 4096B map_ids 19,17
btf_id 75
66: sched_cls name cls_case1 tag e06a3bd62293d65d gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 328B jited 216B memlock 4096B map_ids 19,17
btf_id 80
72: sched_cls name cls_case1 tag e06a3bd62293d65d gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 328B jited 216B memlock 4096B map_ids 19,17
btf_id 85
78: sched_cls name cls_case1 tag e06a3bd62293d65d gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 328B jited 216B memlock 4096B map_ids 19,17
btf_id 90
79: sched_cls name cls_case2 tag ee218ff893dca823 gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 336B jited 218B memlock 4096B map_ids 19,18
btf_id 90
80: sched_cls name cls_exit tag e78a58140deed387 gpl
loaded_at 2020-10-22T08:04:45-0400 uid 0
xlated 288B jited 177B memlock 4096B map_ids 19
btf_id 90
I also run the following upstream kselftest with patches iproute2 and
all passed.
test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.
The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.
In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.
Other maps/progs will be loaded by libbpf directly.
[1] https://lore.kernel.org/bpf/20190820114706.18546-1-toke@redhat.com/
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.
Function bpf_prog_load() is removed as it's conflict with libbpf
function name.
bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch aim to add basic checking functions for later iproute2
libbpf support.
First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.
Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.
When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.
Fixes: 4c551369e0 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Invoke parse_on_off() instead of rolling a custom function.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Convert bridge/link.c from a custom on_off parser to the new global one.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Ido Schimmel says:
====================
From: Ido Schimmel <idosch@nvidia.com>
Patch #1 prints the recently added 'RTNH_F_TRAP' flag.
Patch #2 makes sure that nexthop flags are always printed for nexthop
objects. Even when the nexthop does not have a device, such as a
blackhole nexthop or a group.
Example output with netdevsim:
$ ip nexthop
id 1 via 192.0.2.2 dev eth0 scope link trap
id 2 blackhole trap
id 3 group 2 trap
Example output with mlxsw:
$ ip nexthop
id 1 via 192.0.2.2 dev swp3 scope link offload
id 2 blackhole offload
id 3 group 2 offload
Tested with fib_nexthops.sh that uses "ip nexthop" output:
Tests passed: 164
Tests failed: 0
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.
Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Update kernel headers to commit:
f9e425e99b07 ("octeontx2-af: Add support for RSS hashing based on Transport protocol field")
Signed-off-by: David Ahern <dsahern@gmail.com>
Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.
Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata says:
====================
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc.
Currently a common libre tool for configuration of DCB is OpenLLDP. This
suite contains a daemon that uses Linux DCB interface to configure HW
according to the DCB TLVs exchanged over an interface. The daemon can also
be controlled by a client, through which the user can adjust and view the
configuration. The downside of using OpenLLDP is that it is somewhat
heavyweight and difficult to use in scripts, and does not support
extensions such as buffer and rate commands.
For access to many HW features, one would be perfectly fine with a
fire-and-forget tool along the lines of "ip" or "tc". For scripting in
particular, this would be ideal. This author is aware of one such tool,
mlnx_qos from Mellanox OFED scripts collection[1].
The downside here is that the tool is very verbose, the command line
language is awkward to use, it is not packaged in Linux distros, and
generally has the appearance of a very vendor-specific tool, despite not
being one.
This patchset addresses the above issues by providing a seed of a clean,
well-documented, easily usable, extensible fire-and-forget tool for DCB
configuration:
# dcb ets set dev eni1np1 \
tc-tsa all:strict 0:ets 1:ets 2:ets \
tc-bw all:0 0:33 1:33 2:34
# dcb ets show dev eni1np1 tc-tsa tc-bw
tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict
tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0
# dcb ets set dev eni1np1 tc-bw 1:30 2:37
# dcb -j ets show dev eni1np1 | jq '.tc_bw[2]'
37
The patchset proceeds as follows:
- Many tools in iproute2 have an option to work in batch mode, where the
commands to run are given in a file. The code to handle batching is
largely the same independent of the tool in question. In patch #1, add a
helper to handle the batching, and migrate individual tools to use it.
- A number of configuration options come in a form of an on-off switch.
This in turn can be considered a special case of parsing one of a given
set of strings. In patch #2, extract helpers to parse one of a number of
strings, on top of which build an on-off parser.
Currently each tool open-codes the logic to parse the on-off toggle. A
future patch set will migrate instances of this code over to the new
helpers.
- The on/off toggles from previous list item sometimes need to be dumped.
While in the FP output, one typically wishes to maintain consistency with
the command line and show actual strings, "on" and "off", in JSON output
one would rather use booleans. This logic is somewhat annoying to have to
open-code time and again. Therefore in patch #3, add a helper to do just
that.
- The DCB tool is built on top of libmnl. Several routines will be
basically the same in DCB as they are currently in devlink. In patches
#4-#6, extract them to a new module, mnl_utils, for easy reuse.
- Much of DCB is built around arrays. A syntax similar to the iplink_vlan's
ingress-qos-map / egress-qos-map is very handy for describing changes
done to such arrays. Therefore in patch #7, extract a helper,
parse_mapping(), which manages parsing of key-value arrays. In patch #8,
fix a buglet in the helper, and in patch #9, extend it to allow setting
of all array elements in one go.
- In patch #10, add a skeleton of "dcb", which contains common helpers and
dispatches to subtools for handling of individual objects. The skeleton
is empty as of this patch.
In patch #11, add "dcb_ets", a module for handling of specifically DCB
ETS objects.
The intention is to gradually add handlers for at least PFC, APP, peer
configuration, buffers and rates.
[1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc. Add a new tool to show that
configuration and tweak it.
DCB allows configuration of several objects, and possibly could expand to
pre-standard CEE interfaces. Therefore the tool itself is a lean shell that
dispatches to subtools each dedicated to one of the objects.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.
To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.
Have iplink_vlan pass false for allow_all.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Currently argc and argv are not updated unless parsing of all of the
mapping was successful. However in that case, "ip link" will point at the
wrong argument when complaining:
# ip link add name eth0.100 link eth0 type vlan id 100 egress 1:1 2:foo
Error: argument "1" is wrong: invalid egress-qos-map
Update argc and argv even in the case of parsing error, so that the right
element is indicated.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.
Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Receiving a message in libmnl is a somewhat involved operation. Devlink's
mnlg library has an implementation that is going to be handy for other
tools as well. Extract it into a new helper.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Allocation of a new netlink message with the two usual headers is reusable
with other netlink netlink message types. Extract it into a helper,
mnlu_msg_prepare(). Take the second header as an argument, instead of
passing in parameters to initialize it, and copy it in.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
This little dance of mnl_socket_open(), option setting, and bind, is the
same regardless of tool. Extract into a new module that should hold helpers
for working with libmnl, mnl_util.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
The value of a number of booleans is shown as "on" and "off" in the plain
output, and as an actual boolean in JSON mode. Add a function that does
that.
RDMA tool already uses a function named print_on_off(). This function
always shows "on" and "off", even in JSON mode. Since there are probably
very few if any consumers of this interface at this point, migrate it to
the new central print_on_off() as well.
Signed-off-by: Petr Machata <me@pmachata.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.
Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".
Convert the macsec code to the new helpers.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Manpage:
* Remove the extra "and to ip packets" part from command description
to make it more understandable.
* Redirect packets to eth1, instead of eth0, as told in the
description.
Help string:
* "mpls pop" can be followed by a CONTROL keyword.
* "mpls modify" can also set the MPLS_BOS field.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
* "vlan pop" can be followed by a CONTROL keyword.
* Add missing space in error message.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vlad Buslov says:
====================
Implement support for terse dump mode which provides only essential
classifier/action info (handle, stats, cookie, etc.). Use new
TCA_DUMP_FLAGS_TERSE flag to prevent copying of unnecessary data from
kernel.
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
Implement support for classifier/action terse dump using new TCA_DUMP_FLAGS
tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Set the flag when
user requested it with following example CLI (-br for 'brief'):
$ tc -s -br filter show dev ens1f0 ingress
filter protocol ip pref 49151 flower chain 0
filter protocol ip pref 49151 flower chain 0 handle 0x1
not_in_hw
action order 1: gact Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
filter protocol ip pref 49152 flower chain 0
filter protocol ip pref 49152 flower chain 0 handle 0x1
not_in_hw
action order 1: gact Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
In terse mode dump only outputs essential data needed to identify the
filter and action (handle, cookie, etc.) and stats, if requested by the
user. The intention is to significantly improve rule dump rate by omitting
all static data that do not change after rule is created.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Modify implementations that return error from action_until->print_aopt()
callback to silently skip actions that don't have their corresponding
TCA_ACT_OPTIONS attribute set (some actions already behave like this).
Print action kind before returning from action_until->print_aopt()
callbacks. This is necessary to support terse dump mode in following patch
in the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.
Use our own definition only if there isn't one already.
Fixes: d5acae244f ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 02a261b5ba ("m_mpls: add mac_push action") added a matches()
test for the "mac_push" string before the test for "modify".
This changes the previous behaviour as 'action m' used to match
"modify" while it now matches "mac_push".
Revert to the original behaviour by moving the "mac_push" test after
"modify".
Fixes: 02a261b5ba ("m_mpls: add mac_push action")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tuong Lien says:
====================
This series adds two new options in the 'iproute2/tipc' command, enabling users
to use the new TIPC encryption features, i.e. the master key and rekeying which
have been recently merged in kernel.
The help menu of the "tipc node set key" command is also updated accordingly:
# tipc node set key --help
Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES]
tipc node set key rekeying REKEYING
KEY
Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form:
[KEY: 16, 24 or 32 octets][SALT: 4 octets]
ALGNAME
Cipher algorithm [default: "gcm(aes)"]
PROPERTIES
master - Set KEY as a cluster master key
<empty> - Set KEY as a cluster key
nodeid NODEID - Set KEY as a per-node key for own or peer
REKEYING
INTERVAL - Set rekeying interval (in minutes) [0: disable]
now - Trigger one (first) rekeying immediately
EXAMPLES
tipc node set key this_is_a_master_key master
tipc node set key 0x746869735F69735F615F6B657931365F73616C74
tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002
tipc node set key rekeying 600
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
As supported in kernel, the TIPC encryption rekeying can be tuned using
the netlink attribute - 'TIPC_NLA_NODE_REKEYING'. Now we add the
'rekeying' option correspondingly to the 'tipc node set key' command so
that user will be able to perform that tuning:
tipc node set key rekeying REKEYING
where the 'REKEYING' value can be:
INTERVAL - Set rekeying interval (in minutes) [0: disable]
now - Trigger one (first) rekeying immediately
For example:
$ tipc node set key rekeying 60
$ tipc node set key rekeying now
The command's help menu is also updated with these descriptions for the
new command option.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
In addition to the support of master key in kernel, we add the 'master'
option to the 'tipc node set key' command for user to be able to
specify a key as master key during the key setting. This is carried out
by turning on the new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER'.
For example:
$ tipc node set key "this_is_a_master_key" master
The command's help menu is also updated to give a better description of
all the available options.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>