This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
The out of date documentation was removed in 2017, but the instructions
in the README were not removed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata says:
====================
Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.
In this patch set, introduce support for resilient next-hop groups to
iproute2.
- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.
- Patches #2 and #3 add new helpers that will be useful later.
- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
command line argument, and to dispatch based on the specified type.
- Patch #5 adds the support for resilient next-hop groups.
- Patch #6 adds the support for resilient next-hop group bucket interface.
To illustrate the usage, consider the following commands:
# ip nexthop add id 1 via 192.0.2.2 dev dummy1
# ip nexthop add id 2 via 192.0.2.3 dev dummy1
# ip nexthop add id 10 group 1/2 type resilient \
buckets 8 idle_timer 60 unbalanced_timer 300
The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.
And this is how the next-hop group bucket interface looks:
# ip nexthop bucket show id 10
id 10 index 0 idle_time 5.59 nhid 1
id 10 index 1 idle_time 5.59 nhid 1
id 10 index 2 idle_time 8.74 nhid 2
id 10 index 3 idle_time 8.74 nhid 2
id 10 index 4 idle_time 8.74 nhid 1
id 10 index 5 idle_time 8.74 nhid 1
id 10 index 6 idle_time 8.74 nhid 1
id 10 index 7 idle_time 8.74 nhid 1
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add ability to dump multiple nexthop buckets and get a specific one.
Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop
id 1 via 192.0.2.2 dev dummy10 scope link
id 2 via 192.0.2.19 dev dummy20 scope link
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
# ip nexthop bucket
id 10 index 0 idle_time 28.1 nhid 2
id 10 index 1 idle_time 28.1 nhid 2
id 10 index 2 idle_time 28.1 nhid 2
id 10 index 3 idle_time 28.1 nhid 2
id 10 index 4 idle_time 28.1 nhid 1
id 10 index 5 idle_time 28.1 nhid 1
id 10 index 6 idle_time 28.1 nhid 1
id 10 index 7 idle_time 28.1 nhid 1
# ip nexthop bucket show nhid 1
id 10 index 4 idle_time 53.59 nhid 1
id 10 index 5 idle_time 53.59 nhid 1
id 10 index 6 idle_time 53.59 nhid 1
id 10 index 7 idle_time 53.59 nhid 1
# ip nexthop bucket get id 10 index 5
id 10 index 5 idle_time 81 nhid 1
# ip -j -p nexthop bucket get id 10 index 5
[ {
"id": 10,
"bucket": {
"index": 5,
"idle_time": 104.89,
"nhid": 1
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add ability to configure resilient nexthop groups and show their current
configuration. Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop show id 10
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
# ip -j -p nexthop show id 10
[ {
"id": 10,
"group": [ {
"id": 1
},{
"id": 2
} ],
"type": "resilient",
"resilient_args": {
"buckets": 8,
"idle_timer": 120,
"unbalanced_timer": 0
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.
These two commands are equivalent:
# ip nexthop add id 10 group 1/2/3
# ip nexthop add id 10 group 1/2/3 type mpath
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.
It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:
bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
46 | char buf[PATH_MAX], *s;
| ^~~~~~~~
| AF_MAX
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.
While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.
Fixes: b2bb289a57 ("xfrm security context support")
Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.
Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.
dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?
Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
format_host_rta_r might return a cached hostname
via its return value and not use the input buffer.
Before:
$ ip -resolve -6 route
dev lo proto kernel metric 256 pref medium
After:
$ ip/ip -resolve -6 route
localhost dev lo proto kernel metric 256 pref medium
Bug-Debian: https://bugs.debian.org/983591
Reported-by: Axel Scheepers <axel.scheepers76@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.
Use API in subsequent devlink patch.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
User helper routines provided by library for counting slash and
splitting string on delimiter.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.
Just parse and dump an additional, per endpoint, u16 attribute
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.
get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.
Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b3 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.
This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.
Fixes: ac3415f5c1 ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As stated in commit ac3415f5c1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.
As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.
Fixes: 95ae9a4870 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.
$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
. ..
This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.
Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()
Fixes: e998e118dd ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.
Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.
Fixes: 0486388a87 ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.
Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.
Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The bridge program does:
fdb_modify:
/* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF;
which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The bridge does this:
fdb_modify:
/* Assume permanent */
if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
req.ndm.ndm_state |= NUD_PERMANENT;
So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:
fdb_modify:
} else if (matches(*argv, "local") == 0 ||
matches(*argv, "permanent") == 0) {
req.ndm.ndm_state |= NUD_PERMANENT;
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.
Let's check the user input and ensure that arguments are valid.
Fixes: a6d0773ebe ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".
This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.
Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
Parav Pandit says:
====================
Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.
vdpa interface also includes showing supported management devices
which support such operations.
This patchset includes kernel uapi headers and a vdpa tool.
examples:
$ vdpa mgmtdev show
vdpasim:
supported_classes net
$ vdpa mgmtdev show -jp
{
"show": {
"vdpasim": {
"supported_classes": [ "net" ]
}
}
}
Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:
$ vdpa dev add mgmtdev vdpasim_net name foo2
Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6
$ vdpa dev show foo2 -jp
{
"dev": {
"foo2": {
"type": "network",
"mgmtdev": "vdpasim_net",
"vendor_id": 0,
"max_vqs": 2,
"max_vq_size": 256
}
}
}
Delete the vdpa device after its use:
$ vdpa dev del foo2
An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
supported_classes
net
pci/0000:03.00:4
supported_classes
net
auxiliary/mlx5_core.sf.8
supported_classes
net
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.
$ vdpa mgmtdev show
vdpasim:
supported_classes net
$ vdpa mgmtdev show -jp
{
"show": {
"vdpasim": {
"supported_classes": [ "net" ]
}
}
}
Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:
$ vdpa dev add mgmtdev vdpasim_net name foo2
Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256
$ vdpa dev show foo2 -jp
{
"dev": {
"foo2": {
"type": "network",
"mgmtdev": "vdpasim_net",
"vendor_id": 0,
"max_vqs": 2,
"max_vq_size": 256
}
}
}
Delete the vdpa device after its use:
$ vdpa dev del foo2
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family
Hence add helper routines to open, close, query family and to perform
send receive operations.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add kernel headers to commit from kernel tree [1].
6acba4951632 ("vdpa_sim_net: Add support for user supported devices")
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>