ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
This reverts commit 9135c4d603.
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.
Fixes: 72a2ff3916 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern <dsahern@gmail.com>
It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The vxlan driver when a neighbor add/delete event occurs sends
NDA_DST filled with a union:
union vxlan_addr {
struct sockaddr_in sin;
struct sockaddr_in6 sin6;
struct sockaddr sa;
};
This eventually calls rt_addr_n2a_r which had no handler for the
AF_BRIDGE family and "???" was being printed.
Add code to properly display this data when requested.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
It will fail with EPERM on Linux 4.15.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Make JSON output work with RED Qdiscs. Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.
Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.
For now following information can be returned:
1) address is unspecified (zero)
2) address is multicast
3) address is internet: family is either AF_INET or
AF_INET6.
More information can be added in the future.
Introduce inline helpers to make code using this new
address classification interface more self explaining:
bool is_addrtype_inet(inet_prefix *addr)
true if @addr is inet address
bool is_addrtype_inet_unspec(inet_prefix *addr)
true if @addr is unspecified inet address
bool is_addrtype_inet_multi(inet_prefix *addr)
true if @addr is multicast inet address
bool is_addrtype_inet_not_unspec(inet_prefix *addr)
true if @addr is not unspecified inet address
false if @addr is not inet or unspecified inet
bool is_addrtype_inet_not_multi(inet_prefix *addr)
true if @addr is not multicast inet address
false if @addr is not inet or multicast inet
Last two are useful for case when we need inet address
that is not unspecified or multicast.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.
Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.
Reduce intendation by one level and get rid of goto/label
to make code more readable.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).
Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.
Still assume AF_INET when @family is AF_UNSPEC.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
When program is loaded with a specified ifindex, use that
ifindex also when creating maps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
ll_addr_n2a() correctly prints tunnel endpoints for gre, ipip, sit
and ip6tnl, but not for ip6gre. Fix this by adding ARPHRD_IP6GRE to
IPv6 tunnel endpoing address conversion.
Before:
-------
$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd \
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
After:
------
$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
link/gre6 :: brd ::
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
When running "ip route list default" and not specifying address family,
one will get all of the routes instead of just default only. The same
is for "exact default" and "match default".
It behaves in such a way because default route with unspecified family
has the same all-zeroes value like no prefix specified at all. Thus
following code blindly ignores the fact, that prefix was actually
specified.
This patch adds the flag PREFIXLEN_SPECIFIED to the default route too.
And then checks its value when filtering routes.
Signed-off-by: Alexander Zubkov <green@msu.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The makeargs() function did not handle end of string correctly
and would reference past end of string.
Found by fuzzing with ASAN.
Reported-by:Bug Basher <iamliketohack@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
For BPF offload we need to specify the ifindex when program is
loaded now. Extend the bpf common code to accommodate that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Expose bpf_parse_common() and bpf_load_common() functions
for those users who may want to modify the parameters to
load after parsing is done.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse_common() parses and loads the program. Rename it
accordingly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Parsing command line is currently done together with potentially
loading a new eBPF program. This makes it more difficult to
provide additional parameters for loading (which may come after
the eBPF program info on the command line).
Split the two (only internally for now). Verbose parameter
has to be saved in struct bpf_cfg_in to be carried between
the stages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
struct bpf_cfg_in already carries a pointer to sock_filter ops.
It's currently set to a local variable in bpf_parse_opt_tbl(),
shared between parsing and loading stages. Move the array
entirely to struct bpf_cfg_in, this will allow us to split
parsing and loading.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse() will parse command line arguments to find out the
program mode. This mode will later be needed at loading time.
Instead of keeping it locally add it to struct bpf_cfg_in,
this will allow splitting parsing and loading stages.
enum bpf_mode has to be moved to the header file, because C
doesn't allow forward declaration of enums.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Program type is needed both for parsing and loading of
the program. Parsing may also induce the type based on
signatures from __bpf_prog_meta. Instead of passing
the type around keep it in struct bpf_cfg_in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.
Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.
Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
If program loading fails before verifier prints its first
message, the verifier log will not be initialized. Always
set the first character of the log buffer to zero to make
sure we don't dump non-printable characters to the terminal.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Kernel can now return non-fatal error messages in extack facility.
Update iproute2 to dump to use if present.
- rename nl_dump_ext_err to nl_dump_ext_ack
- rename errmsg to msg
- add call to nl_dump_ext_ack in rtnl_dump_done and __rtnl_talk for
non-error path
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.
With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.
With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.
We need to free answer after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.
We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.
Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.
Reproduce with:
$ ip -c a
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.
Signed-off-by: Ivan Delalande <colona@arista.com>
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.
This fixes the following static link error with uClibc-ng:
.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Signed-off-by: Phil Sutter <phil@nwl.cc>
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.
The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
How to reproduce:
$ ip -t mon label link
(gdb) bt
.#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
.#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Move the json printer which is based on json writer into the
iproute2 library, so it can be used by library code and tools
other than ip. Should probably have been done from the beginning
like that given json writer is in the library already anyway.
No functional changes.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Julien Fortin <julien@cumulusnetworks.com>
As David Laight correctly pointed out, the first version of strlcpy()
modified dst buffer behind the string copied into it. Fix this by
writing NUL to the byte immediately following src string instead of to
the last byte in dst. Doing so also allows to reduce overhead by using
memcpy().
Improve strlcat() by avoiding the call to strlcpy() if dst string is
already full, not just as sanity check.
Signed-off-by: Phil Sutter <phil@nwl.cc>