Go to file
Daniel Borkmann 32e93fb7f6 {f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-23 16:10:44 -08:00
bridge bridge: fdb: minor syntax fix in help text 2015-11-03 16:27:39 -08:00
doc Add ip rule save/restore 2015-10-22 23:35:57 -07:00
etc/iproute2 rt_dsfield: fix Expedited Forwarding PHB 2014-12-03 18:50:59 -08:00
examples {f,m}_bpf: allow for sharing maps 2015-11-23 16:10:44 -08:00
genl Merge branch 'master' into net-next 2015-05-28 09:18:01 -07:00
include {f,m}_bpf: allow for sharing maps 2015-11-23 16:10:44 -08:00
ip iproute2: Ignore EADDRNOTAVAIL errors during address flush operation 2015-11-23 15:59:08 -08:00
lib Merge branch 'master' into net-next 2015-11-03 16:31:57 -08:00
man bridge.8: document fdb replace command 2015-11-23 15:58:07 -08:00
misc lnstat: fix header displaying mechanism 2015-11-23 15:54:05 -08:00
netem netem: fix installs of dist files 2010-07-31 19:31:04 -07:00
tc {f,m}_bpf: allow for sharing maps 2015-11-23 16:10:44 -08:00
testsuite tests: Add output testing 2015-06-24 23:37:26 -04:00
tipc tipc: fix bearer get/set help synopsis 2015-08-10 11:18:01 -07:00
.gitignore gitignore: Ignore 'doc' files generated at runtime 2014-10-29 22:26:15 -07:00
COPYING Update address of FSF in license 2008-03-08 13:31:03 -08:00
Makefile enable transparent LFS 2015-06-24 23:07:34 -04:00
README README: update mail address and download location 2013-01-18 09:54:58 -08:00
README.decnet Decnet documentation update 2005-06-13 18:47:56 +00:00
README.devel iproute2: fix minor typo in comments 2011-07-11 10:11:09 -07:00
README.distribution README cleanup's 2012-01-03 15:04:55 -08:00
README.iproute2+tc tc, bpf: finalize eBPF support for cls and act front-end 2015-04-10 13:31:19 -07:00
README.lnstat Rename: misc/README.lnstat -> README.lnstat 2004-10-19 20:24:47 +00:00
configure configure: Check for Berkeley DB for arpd compilation 2015-09-21 14:38:38 -07:00

README

This is a set of utilities for Linux networking.

Information:
    http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2

Download:
    http://www.kernel.org/pub/linux/utils/net/iproute2/

Repository:
    git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

How to compile this.
--------------------
1. libdbm

arpd needs to have the db4 development libraries. For Debian
users this is the package with a name like libdb4.x-dev.
DBM_INCLUDE points to the directory with db_185.h which
is the include file used by arpd to get to the old format Berkeley
database routines.  Often this is in the db-devel package.

2. make

The makefile will automatically build a Config file which
contains whether or not ATM is available, etc.

3. To make documentation, cd to doc/ directory , then
   look at start of Makefile and set correct values for
   PAGESIZE=a4		, ie: a4 , letter ...	(string)
   PAGESPERPAGE=2	, ie: 1 , 2 ...		(numeric)
   and make there. It assumes, that latex, dvips and psnup
   are in your path.

4. This package includes matching sanitized kernel headers because
   the build environment may not have up to date versions. See Makefile
   if you have special requirements and need to point at different
   kernel include files.

Stephen Hemminger
stephen@networkplumber.org

Alexey Kuznetsov
kuznet@ms2.inr.ac.ru