This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Kernel code and interface.
--------------------------
* Compile time switches
There is only one, but very important, compile time switch.
It is not settable by "make config", but should be selected
manually and after a bit of thinking in <include/net/pkt_sched.h>
PSCHED_CLOCK_SOURCE can take three values:
PSCHED_GETTIMEOFDAY
PSCHED_JIFFIES
PSCHED_CPU
PSCHED_GETTIMEOFDAY
Default setting is the most conservative PSCHED_GETTIMEOFDAY.
It is very slow both because of weird slowness of do_gettimeofday()
and because it forces code to use unnatural "timeval" format,
where microseconds and seconds fields are separate.
Besides that, it will misbehave, when delays exceed 2 seconds
(f.e. very slow links or classes bounded to small slice of bandwidth)
To resume: as only you will get it working, select correct clock
source and forget about PSCHED_GETTIMEOFDAY forever.
PSCHED_JIFFIES
Clock is derived from jiffies. On architectures with HZ=100
granularity of this clock is not enough to make reasonable
bindings to real time. However, taking into account Linux
architecture problems, which force us to use artificial
integrated clock in any case, this switch is not so bad
for schduling even on high speed networks, though policing
is not reliable.
PSCHED_CPU
It is available only for alpha and pentiums with correct
CPU timestamp. It is the fastest way, use it when it is available,
but remember: not all pentiums have this facility, and
a lot of them have clock, broken by APM etc. etc.