This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:
$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0
where:
- "trace" and "prealloc" may appear as useless but just anticipate for future
implementations of other ioam option types.
- "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
the corresponding uapi).
- "ns" is an IOAM namespace ID attached to the pre-allocated trace.
- "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
limited size (see IOAM6_TRACE_DATA_SIZE_MAX).
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).
$ ip ioam
Usage: ip ioam { COMMAND | help }
ip ioam namespace show
ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
ip ioam namespace del ID
ip ioam schema show
ip ioam schema add ID DATA
ip ioam schema del ID
ip ioam namespace set ID schema { ID | none }
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")
Signed-off-by: David Ahern <dsahern@kernel.org>
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.
$ ip -br neigh
172.16.12.100 bridge0 b0:fc:36:2f:07:43
172.16.12.174 bridge0 8c:16:45:2f:bc:1c
172.16.12.250 bridge0 04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d bridge0 b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c bridge0 8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74 bridge0 04:d9:f5:c1:0c:74
And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.
Fix this by using the standard macro PRIu64 instead.
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
devlink and vdpa use BIT() together with 64-bit flag fields. devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.
Fix this by making BIT() use uint64_t instead of unsigned long.
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
On some systems we fail to link because of missing math lib.
add -lm to devlink.
LINK devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2
Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Implement a decrement operation for ttl and hoplimit.
Since this is just syntactic sugar, it goes that:
tc filter add ... action pedit ex munge ip ttl dec ...
tc filter add ... action pedit ex munge ip6 hoplimit dec ...
is just a more readable version of this:
tc filter add ... action pedit ex munge ip ttl add 0xff ...
tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...
This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.
Tested with skip_sw on Mellanox ConnectX-6 Dx.
[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989
v3:
- Use dedicated flags argument in parse_cmd() (David Ahern)
- Minor rewording of the man page
v2:
- Fix whitespace issue (Stephen Hemminger)
- Add to usage info in explain()
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch just prepares the flags argument, so it's
available to the next patch.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.
The WWAN interface for a new data channel should be created with a
command like this:
ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2
Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".
Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.
Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.
As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.
To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.
An SRv6 End.DT46 Behavior instance can be created as follows:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100
Standard Output:
$ ip -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium
JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
"dst": "2001:db8::1",
"encap": "seg6local",
"action": "End.DT46",
"vrftable": 100,
"dev": "vrf100",
"metric": 1024,
"flags": [ ],
"pref": "medium"
} ]
This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
Dmytro Linkin says:
====================
Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object
Examples:
Display all rate objects:
# devlink port function rate show
pci/0000:03:00.0/1 type leaf parent some_group
pci/0000:03:00.0/2 type leaf tx_share 12Mbit
pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps
Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:
# devlink port function rate show pci/0000:03:00.0/1
pci/0000:03:00.0/1 type leaf
Display node rate object with name some_group of the pci/0000:03:00.0
device:
# devlink port function rate show pci/0000:03:00.0/some_group
pci/0000:03:00.0/some_group type node
Display leaf rate object rate values using IEC units:
# devlink -i port function rate show pci/0000:03:00.0/2
pci/0000:03:00.0/2 type leaf 11718Kibit
Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:
# devlink -jp port function rate show pci/0000:03:00.0/2
{
"rate": {
"pci/0000:03:00.0/2": {
"type": "leaf",
"tx_share": 1500000
}
}
}
Create node rate object with name "1st_group" on pci/0000:03:00.0 device:
# devlink port function rate add pci/0000:03:00.0/1st_group
Create node rate object with specified parameters:
# devlink port function rate add pci/0000:03:00.0/2nd_group \
tx_share 10Mbit tx_max 30Mbit parent 1st_group
Set parameters to the specified leaf rate object:
# devlink port function rate set pci/0000:03:00.0/1 \
tx_share 2Mbit tx_max 10Mbit
Set leaf's parent to "1st_group":
# devlink port function rate set pci/0000:03:00.0/1 parent 1st_group
Unset leaf's parent:
# devlink port function rate set pci/0000:03:00.0/1 noparent
Delete node rate object:
# devlink port function rate del pci/0000:03:00.0/2nd_group
Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Implement user commands to manage devlink port func rate objects.
List all rate commands:
$ devlink port func rate help
or just
$ devlink port func rate
To list all OR particular rate object:
$ devlink port func rate show
pci/0000:03:00.0/some_group: type node
pci/0000:03:00.0/0: type leaf
pci/0000:03:00.0/1: type leaf
$ devlink prot func rate show pci/0000:03:00.0/1
pci/0000:03:00.0/0: type leaf
$ devlink prot func rate show pci/0000:03:00.0/some_group
pci/0000:03:00.0/some_group: type node
Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:
$ devlink port func rate add pci/0000:03:00.0/some_group
or with defining tx rate limits
$ devlink port func rate add pci/0000:03:00.0/some_group \
tx_shara 10kbit tx_max 100mbit
NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.
To delete node object:
$ devlink port func rate del pci/0000:03:00.0/some_group
Set rate limits of existing rate object:
$ devlink prot func rate set pci/0000:03:00.0/0 \
tx_share 5MBps tx_max 25GBps
$ devlink prot func rate set pci/0000:03:00.0/some_group \
tx_share 0
Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.
NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.
NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.
Set or unset parent of existing rate object:
$ devlink prot func rate set pci/0000:03:00.0/0 parent some_group
$ devlink port func rate set pci/0000:03:00.0/0 noparent
NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.
An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.
This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.
Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state -est-rel+trk \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state +rel+trk \
action mirred egress redirect dev ens1f0_1
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
genl_add_mcast_grp doesn't set errno in all cases.
On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with
"can't subscribe to mptcp events: Success"
Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.
Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.
Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
v2:
- add help and route.8 man page updates
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"
The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.
This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
tipc segfaults when called with an abnormally long key:
$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated
Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.
Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
tipc segfaults when called with an abnormally long algname:
$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated
Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.
Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.
Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.
Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.
Fix this using a goto, as it is already done in the same function.
Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.
Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.
Avoid this checking for fd to be positive.
Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
specified and nbands is 0, terminating execution.
Thus nbands cannot be < 1 at this point and this code cannot be executed.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>