ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.
First the resources exposed by the driver can be observed, for example:
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry
resources:
name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
Some resource's size can be changed. Examples:
$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368
The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry size_valid false
resources:
name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).
In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Make JSON output work with RED Qdiscs. Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.
Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.
For now following information can be returned:
1) address is unspecified (zero)
2) address is multicast
3) address is internet: family is either AF_INET or
AF_INET6.
More information can be added in the future.
Introduce inline helpers to make code using this new
address classification interface more self explaining:
bool is_addrtype_inet(inet_prefix *addr)
true if @addr is inet address
bool is_addrtype_inet_unspec(inet_prefix *addr)
true if @addr is unspecified inet address
bool is_addrtype_inet_multi(inet_prefix *addr)
true if @addr is multicast inet address
bool is_addrtype_inet_not_unspec(inet_prefix *addr)
true if @addr is not unspecified inet address
false if @addr is not inet or unspecified inet
bool is_addrtype_inet_not_multi(inet_prefix *addr)
true if @addr is not multicast inet address
false if @addr is not inet or multicast inet
Last two are useful for case when we need inet address
that is not unspecified or multicast.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Incorporate upstream changes to fix compliation with MUSL.
See commit 6926e041a892
("uapi/if_ether.h: prevent redefinition of struct ethhdr")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
From upstream kernel commit f19397a5c65665d66e3866b42056f1f58b7a366b
bpf: Add access to snd_cwnd and others in sock_ops
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
For BPF offload we need to specify the ifindex when program is
loaded now. Extend the bpf common code to accommodate that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Expose bpf_parse_common() and bpf_load_common() functions
for those users who may want to modify the parameters to
load after parsing is done.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse_common() parses and loads the program. Rename it
accordingly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Parsing command line is currently done together with potentially
loading a new eBPF program. This makes it more difficult to
provide additional parameters for loading (which may come after
the eBPF program info on the command line).
Split the two (only internally for now). Verbose parameter
has to be saved in struct bpf_cfg_in to be carried between
the stages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
struct bpf_cfg_in already carries a pointer to sock_filter ops.
It's currently set to a local variable in bpf_parse_opt_tbl(),
shared between parsing and loading stages. Move the array
entirely to struct bpf_cfg_in, this will allow us to split
parsing and loading.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse() will parse command line arguments to find out the
program mode. This mode will later be needed at loading time.
Instead of keeping it locally add it to struct bpf_cfg_in,
this will allow splitting parsing and loading stages.
enum bpf_mode has to be moved to the header file, because C
doesn't allow forward declaration of enums.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Program type is needed both for parsing and loading of
the program. Parsing may also induce the type based on
signatures from __bpf_prog_meta. Instead of passing
the type around keep it in struct bpf_cfg_in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.
Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.
Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
1. Put the declarations of strlcpy and strlcat inside
an #ifdef NEED_STRLCPY. Their declarations were already in a
similar #ifdef.
2. In bpf_scm.h, include sys/un.h for struct sockaddr_un.
3. In utils.h, include time.h for struct timeval.
Tested: builds on ubuntu 14.04 with "make clean distclean; ./configure && make -j64"
Tested: 4.14.1 builds on Android with Android-specific #ifndefs for missing library code
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>