iproute2

History

Daniel Borkmann 4bd624467b tc: built-in eBPF exec proxy This work follows upon commit `6256f8c9e4` ("tc, bpf: finalize eBPF support for cls and act front-end") and takes up the idea proposed by Hannes Frederic Sowa to spawn a shell (or any other command) that holds generated eBPF map file descriptors. File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: # env \| grep BPF BPF_NUM_MAPS=3 BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1) BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0) BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2) # ls -la /proc/self/fd [...] lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4 [...] lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map The advantage (as opposed to the direct/native usage) is that now the shell is map fd owner and applications can terminate and easily reattach to descriptors w/o any kernel changes. Moreover, multiple applications can easily read/write eBPF maps simultaneously. To further allow users for experimenting with that, next step is to add a small helper that can get along with simple data types, so that also shell scripts can make use of bpf syscall, f.e to read/write into maps. Generally, this allows for prepopulating maps, or any runtime altering which could influence eBPF program behaviour (f.e. different run-time classifications, skb modifications, ...), dumping of statistics, etc. Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com>		2015-04-27 16:39:23 -07:00
..
.gitignore	Add ignore files to make using git easier	2006-08-08 12:04:38 -07:00
Makefile	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
README.last	(Logical change 1.3)	2004-04-15 20:56:59 +00:00
e_bpf.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
em_canid.c	Ematch used to classify CAN frames according to their identifiers	2012-08-20 13:11:55 -07:00
em_cmp.c	Fix wrong comparison in cmp_print_eopt()	2011-10-07 11:16:15 -07:00
em_ipset.c	tc: add ipset ematch	2012-08-13 08:33:50 -07:00
em_meta.c	add missing underscore to man page and example nf_mark ematch	2014-10-09 08:24:00 -07:00
em_nbyte.c	tc: remove dlfcn.h from files that dont need it	2009-11-13 14:14:07 -08:00
em_u32.c	tc: remove dlfcn.h from files that dont need it	2009-11-13 14:14:07 -08:00
emp_ematch.l	fix build issues with flex ver 2.5	2010-04-22 15:27:42 -07:00
emp_ematch.y	emp: fix warning on deprecated bison directive	2014-10-09 08:31:10 -07:00
f_basic.c	discourage use of direct policer interface	2014-10-09 08:26:57 -07:00
f_bpf.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
f_cgroup.c	discourage use of direct policer interface	2014-10-09 08:26:57 -07:00
f_flow.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
f_fw.c	discourage use of direct policer interface	2014-10-09 08:26:57 -07:00
f_route.c	route classifier support for multiple actions	2014-10-09 08:26:57 -07:00
f_rsvp.c	discourage use of direct policer interface	2014-10-09 08:26:57 -07:00
f_tcindex.c	tcindex classifier support for multiple actions	2014-10-09 08:26:56 -07:00
f_u32.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
m_action.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
m_bpf.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
m_connmark.c	tc: add support for connmark action	2015-04-13 10:49:45 -07:00
m_csum.c	csum action, fix typo	2012-03-15 14:24:59 -07:00
m_ematch.c	Fix NULL pointer reference when using basic match	2010-07-29 18:03:35 -07:00
m_ematch.h	include needed files	2012-12-23 11:49:06 -08:00
m_estimator.c	ip: make local functions static	2013-02-12 11:38:35 -08:00
m_gact.c	ip: make local functions static	2013-02-12 11:38:35 -08:00
m_ipt.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
m_mirred.c	More minor spelling fixes	2013-08-04 15:10:05 -07:00
m_nat.c	action: typo nat fix	2013-09-30 21:31:40 -07:00
m_pedit.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
m_pedit.h	Remove trailing whitespace	2006-12-05 10:10:22 -08:00
m_police.c	Remove unnecessary debug statement	2014-05-28 16:54:26 -07:00
m_simple.c	tc: fix compilation warning on 32bits arch	2015-04-27 11:41:46 -07:00
m_skbedit.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
m_vlan.c	actions: Get vlan action to work in pipeline	2015-01-13 17:22:44 -08:00
m_xt.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
m_xt_old.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
p_icmp.c	Remove trailing whitespace	2006-12-05 10:10:22 -08:00
p_ip.c	Remove trailing whitespace	2006-12-05 10:10:22 -08:00
p_tcp.c	Remove trailing whitespace	2006-12-05 10:10:22 -08:00
p_udp.c	Remove trailing whitespace	2006-12-05 10:10:22 -08:00
q_atm.c	Convert to use rta_getattr_ functions	2012-04-10 08:47:55 -07:00
q_cbq.c	linklayer interface between kernel and tc/userspace	2013-09-03 08:21:24 -07:00
q_choke.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
q_codel.c	tc-codel: Update usage text	2012-05-24 15:02:05 -07:00
q_drr.c	Convert to use rta_getattr_ functions	2012-04-10 08:47:55 -07:00
q_dsmark.c	Convert to use rta_getattr_ functions	2012-04-10 08:47:55 -07:00
q_fifo.c	iproute2: clearer error messages for fifo and tbf qdiscs	2013-02-21 08:34:34 -08:00
q_fq.c	fq: allow options of fair queue set to ~0U	2014-06-09 12:42:36 -07:00
q_fq_codel.c	fq_codel: Fair Queue Codel AQM	2012-05-22 14:17:49 -07:00
q_gred.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
q_hfsc.c	HFSC (7) & (8) documentation + assorted changes	2011-11-02 16:33:50 -07:00
q_hhf.c	support for Heavy Hitter Filter (HHF) qdisc	2014-05-09 12:10:47 -07:00
q_htb.c	htb: Move direct_qlen code part to htb_parse_opt().	2014-03-21 14:20:06 -07:00
q_ingress.c	tc: remove stale code	2010-01-21 10:13:01 -08:00
q_mqprio.c	ip: make local functions static	2013-02-12 11:38:35 -08:00
q_multiq.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
q_netem.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
q_pie.c	PIE: Proportional Integral controller Enhanced	2014-01-09 22:50:47 -08:00
q_prio.c	tc: prio: Perform more strict check on priomap.	2012-06-18 12:25:08 -07:00
q_qfq.c	Convert to use rta_getattr_ functions	2012-04-10 08:47:55 -07:00
q_red.c	Convert to use rta_getattr_ functions	2012-04-10 08:47:55 -07:00
q_rr.c	ip: make local functions static	2013-02-12 11:38:35 -08:00
q_sfb.c	tc : SFB flow scheduler	2011-04-12 14:27:37 -07:00
q_sfq.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
q_tbf.c	Fixed 'tc qdisc show' for tbf when latency<0	2014-05-28 17:08:16 -07:00
static-syms.c	Fix build when shared libraries are disabled	2013-03-13 08:29:59 -07:00
tc.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_bpf.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_bpf.h	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_cbq.c	Replace "usec" by "time" in function names	2007-03-13 14:42:17 -07:00
tc_cbq.h	(Logical change 1.3)	2004-04-15 20:56:59 +00:00
tc_class.c	tc class: Show classes as ASCII graph	2014-12-27 10:16:51 -08:00
tc_common.h	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_core.c	htb: support 64bit rates	2013-11-22 17:36:18 -08:00
tc_core.h	htb: support 64bit rates	2013-11-22 17:36:18 -08:00
tc_estimator.c	Introduce TIME_UNITS_PER_SEC to represent internal clock resolution	2007-03-13 14:42:16 -07:00
tc_exec.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_filter.c	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00
tc_monitor.c	ip: make local functions static	2013-02-12 11:38:35 -08:00
tc_qdisc.c	whitespace cleanup	2014-12-20 15:47:17 -08:00
tc_red.c	red: give a hint about burst value	2011-12-01 09:23:43 -08:00
tc_red.h	(Logical change 1.3)	2004-04-15 20:56:59 +00:00
tc_stab.c	iproute2: various header include fixes for compiling with musl libc	2014-05-28 16:51:39 -07:00
tc_util.c	tc util: Fix possible buffer overflow when print class id	2015-04-20 10:06:02 -07:00
tc_util.h	tc: built-in eBPF exec proxy	2015-04-27 16:39:23 -07:00

README.last

Kernel code and interface.
--------------------------

* Compile time switches

There is only one, but very important, compile time switch.
It is not settable by "make config", but should be selected
manually and after a bit of thinking in <include/net/pkt_sched.h>

PSCHED_CLOCK_SOURCE can take three values:

	PSCHED_GETTIMEOFDAY
	PSCHED_JIFFIES
	PSCHED_CPU


 PSCHED_GETTIMEOFDAY

Default setting is the most conservative PSCHED_GETTIMEOFDAY.
It is very slow both because of weird slowness of do_gettimeofday()
and because it forces code to use unnatural "timeval" format,
where microseconds and seconds fields are separate.
Besides that, it will misbehave, when delays exceed 2 seconds
(f.e. very slow links or classes bounded to small slice of bandwidth)
To resume: as only you will get it working, select correct clock
source and forget about PSCHED_GETTIMEOFDAY forever.


 PSCHED_JIFFIES

Clock is derived from jiffies. On architectures with HZ=100
granularity of this clock is not enough to make reasonable
bindings to real time. However, taking into account Linux
architecture problems, which force us to use artificial
integrated clock in any case, this switch is not so bad
for schduling even on high speed networks, though policing
is not reliable.


 PSCHED_CPU

It is available only for alpha and pentiums with correct
CPU timestamp. It is the fastest way, use it when it is available,
but remember: not all pentiums have this facility, and
a lot of them have clock, broken by APM etc. etc.