iproute2

Commit Graph

Author	SHA1	Message	Date
Simon Horman	c2078f8dc4	tc: flower: Allow _mac options to accept a mask The argument to src_mac and dst_mac may now take an optional mask to limit the scope of matching. * This address is is documented as a LLADDR in keeping with ip-link(8). * The formats accepted match those already output when dumping flower filters from the kernel. Example of use of LLADDR with and without a mask: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:00:00:00:00/23 action drop tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:00:00:00:00 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-21 16:07:53 -08:00
Simon Horman	b2a1f740aa	tc: flower: document that _ip parameters take a PREFIX as an argument. The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an optional prefix length which is used to provide a mask to limit the scope of matching. * This is documented as a PREFIX in keeping with ip-route(8). Example of uses of IPv4 and IPv6 prefixes tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 dst_ip 192.168.1.1 action drop tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 src_ip 10.0.0.0/8 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 src_ip 2001:DB8:1::/48 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 dst_ip 2001:DB8::1 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-21 16:07:41 -08:00
Stephen Hemminger	8578bb731d	Revert "tc: flower: Allow *_mac options to accept a mask" This reverts commit `0390185078`.	2016-12-21 16:06:49 -08:00
Stephen Hemminger	10da552800	Revert "tc: flower: document that *_ip parameters take a PREFIX as an argument." This reverts commit `a8a1dccd2a`.	2016-12-21 16:06:35 -08:00
Simon Horman	0390185078	tc: flower: Allow _mac options to accept a mask The argument to src_mac and dst_mac may now take an optional mask to limit the scope of matching. * This address is is documented as a LLADDR in keeping with ip-link(8). * The formats accepted match those already output when dumping flower filters from the kernel. Example of use of LLADDR with and without a mask: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:00:00:00:00/23 action drop tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \ src_mac 52:54:00:00:00:00 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-21 15:56:39 -08:00
Simon Horman	a8a1dccd2a	tc: flower: document that _ip parameters take a PREFIX as an argument. The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an optional prefix length which is used to provide a mask to limit the scope of matching. * This is documented as a PREFIX in keeping with ip-route(8). Example of uses of IPv4 and IPv6 prefixes tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 dst_ip 192.168.1.1 action drop tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 src_ip 10.0.0.0/8 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 src_ip 2001:DB8:1::/48 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 dst_ip 2001:DB8::1 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-21 15:56:39 -08:00
Roman Mashak	530753184a	tc: pass correct conversion specifier to print 'unsigned int' action index. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-12-14 19:00:36 -08:00
Hadar Hen Zion	449c709c38	tc/m_tunnel_key: Add dest UDP port to tunnel key action Enhance tunnel key action parameters by adding destination UDP port. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>	2016-12-13 10:15:11 -08:00
Hadar Hen Zion	41aa17ff46	tc/cls_flower: Add dest UDP port to tunnel params Enhance IP tunnel parameters by adding destination UDP port. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>	2016-12-13 10:15:11 -08:00
Simon Horman	eb3b5696f1	tc: flower: support matching on ICMP type and code Support matching on ICMP type and code. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 ip_proto icmp type 8 code 0 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-09 12:46:34 -08:00
Simon Horman	6910d65661	tc: flower: introduce enum flower_endpoint Introduce enum flower_endpoint and use it instead of a bool as the type for paramatising source and destination. This is intended to improve read-ability and provide some type checking of endpoint parameters. Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-09 12:45:59 -08:00
Simon Horman	6bd5b80cdc	tc: flower: make use of flower_port_attr_type() safe and silent Make use of flower_port_attr_type() safe: * flower_port_attr_type() may return a valid index into tb[] or -1. Only access tb[] in the case of the former. * Do not access null entries in tb[] Also make usage silent - it is valid for ip_proto to be invalid, for example if it is not specified as part of the filter. Fixes: `a1fb0d4842` ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-05 10:13:26 -08:00
Simon Horman	61dff9ac10	tc: flower: correct name of ip_proto parameter to flower_parse_port() This corrects a typo. Fixes: `a1fb0d4842` ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-05 10:13:26 -08:00
Simon Horman	6ad7e60c1f	tc: flower: document SCTP ip_proto Add SCTP ip_proto to help text and man page. Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-12-05 10:13:26 -08:00
Amir Vadai	d57639a475	tc/act_tunnel: Introduce ip tunnel action This action could be used before redirecting packets to a shared tunnel device, or when redirecting packets arriving from a such a device. The 'unset' action is optional. It is used to explicitly unset the metadata created by the tunnel device during decap. If not used, the metadata will be released automatically by the kernel. The 'set' operation, will set the metadata with the specified values for the encap. For example, the following flower filter will forward all ICMP packets destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before redirecting, a metadata for the vxlan tunnel is created using the tunnel_key action and it's arguments: $ tc filter add dev net0 protocol ip parent ffff: \ flower \ ip_proto 1 \ dst_ip 11.11.11.2 \ action tunnel_key set \ src_ip 11.11.0.1 \ dst_ip 11.11.0.2 \ id 11 \ action mirred egress redirect dev vxlan0 Signed-off-by: Amir Vadai <amir@vadai.me>	2016-12-02 14:12:09 -08:00
Amir Vadai	bb9b63b18e	tc/cls_flower: Classify packet in ip tunnels Introduce classifying by metadata extracted by the tunnel device. Outer header fields - source/dest ip and tunnel id, are extracted from the metadata when classifying. For example, the following will add a filter on the ingress Qdisc of shared vxlan device named 'vxlan0'. To forward packets with outer src ip 11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be forwarded to tap device 'vnet0': $ tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 11.11.0.2 \ enc_dst_ip 11.11.0.1 \ enc_key_id 11 \ dst_ip 11.11.11.1 \ action mirred egress redirect dev vnet0 Signed-off-by: Amir Vadai <amir@vadai.me>	2016-12-02 14:12:09 -08:00
Amir Vadai	aab0f61043	libnetlink: Introduce rta_getattr_be*() Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and change existing code to use it. Signed-off-by: Amir Vadai <amir@vadai.me>	2016-12-02 14:12:09 -08:00
Stephen Hemminger	328374dcfe	Merge branch 'master' into net-next	2016-12-01 10:29:12 -08:00
Roman Mashak	98df0c81da	tc: distinguish Add/Replace filter operations Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-11-29 13:26:10 -08:00
Daniel Borkmann	e42256699c	bpf: make tc's bpf loader generic and move into lib This work moves the bpf loader into the iproute2 library and reworks the tc specific parts into generic code. It's useful as we can then more easily support new program types by just having the same ELF loader backend. Joint work with Thomas Graf. I hacked a rough start of a test suite to make sure nothing breaks [1] and looks all good. [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Thomas Graf <tgraf@suug.ch>	2016-11-29 12:35:32 -08:00
Stephen Hemminger	512caeb273	tc: flower checkpatch cleanups break long lines and minor whitespace changes.	2016-11-29 11:48:52 -08:00
Simon Horman	a1fb0d4842	tc: flower: Support matching on SCTP ports Support matching on SCTP ports in the same way that matching on TCP and UDP ports is already supported. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 ip_proto sctp dst_port 80 \ action drop Signed-off-by: Simon Horman <simon.horman@netronome.com>	2016-11-29 11:44:46 -08:00
Stephen Hemminger	b932e6f372	tc: cleanup style of qdisc code Get rid of lingering mismatches with kernel style.	2016-11-29 11:41:58 -08:00
Roman Mashak	d42e1444f2	tc: print raw qdisc handle. This is v2 patch with fixed code indentation. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-11-29 11:41:58 -08:00
Roman Mashak	4b5451c4cd	tc: improved usage help for fw classifier. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-11-29 11:41:58 -08:00
Paul Blakey	d9c3995ab7	tc: flower: Fix usage message Remove left over usage from removal of eth_type argument. Fixes: `488b41d020` ('tc: flower no need to specify the ethertype') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com>	2016-11-12 10:19:06 +03:00
Shmulik Ladkani	5eca0a3701	tc: m_mirred: Add support for ingress redirect/mirror So far, only the 'egress' direction was implemented. Allow specifying 'ingress' as the direction packet appears on the target interface. For example, this takes incoming 802.1q frames on veth0 and redirects them for input on dummy0: # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q basic \ action mirred ingress redirect dev dummy0 Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>	2016-10-26 11:20:47 -07:00
Daniel Borkmann	4710e46ec3	tc, ipt: don't enforce iproute2 dependency on iptables-devel Since `5cd1adba79` ("Update to current iptables headers") compilation of iproute2 broke for systems without iptables-devel package [1]. Reason is that even though we fall back to build m_ipt.c, the include depends on a xtables-version.h header, which only ships with iptables-devel. Machines not having this package fail compilation with: [...] CC m_ipt.o In file included from ../include/iptables.h:5:0, from m_ipt.c:17: ../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory compilation terminated. ../Config:31: recipe for target 'm_ipt.o' failed make[1]: *** [m_ipt.o] Error 1 The configure script only barks that package xtables was not found in the pkg-config search path. The generated Config then only contains f.e. TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o to TCMODULES. m_ipt.c then includes the local include/iptables.h header copy, which includes the include/xtables.h copy. Latter then includes xtables-version.h, which only ships with iptables-devel. One way to resolve this is to skip this whole mess when pkg-config has no xtables config available. I've carried something along these lines locally for a while now, but it's just too annyoing. :/ Build works fine now also when xtables.pc is not available. [1] http://www.spinics.net/lists/netdev/msg366162.html Fixes: `5cd1adba79` ("Update to current iptables headers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-10-26 10:58:22 -07:00
Jakub Kicinski	87e46a5198	tc: cls_bpf: handle skip_sw and skip_hw flags Add support for controling hardware offload using (now standard) skip_sw and skip_hw flags in cls_bpf. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <simon.horman@netronome.com>	2016-10-17 05:27:59 -07:00
Stephen Hemminger	ec2e005fe5	tc_filter: style cleanup Break long lines and whtespace changes.	2016-10-12 15:21:13 -07:00
Jamal Hadi Salim	120f556d15	tc filters: add support to get individual filters by handle sudo $TC filter add dev $ETH parent ffff: prio 2 protocol ip \ u32 match u32 0 0 flowid 1:1 \ action ok sudo $TC filter add dev $ETH parent ffff: prio 1 protocol ip \ u32 match ip protocol 1 0xff flowid 1:10 \ action ok now dump to see all rules.. $TC -s filter ls dev $ETH parent ffff: protocol ip .... filter pref 1 u32 filter pref 1 u32 fh 801: ht divisor 1 filter pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10 (rule hit 0 success 0) match 00010000/00ff0000 at 8 (success 0 ) action order 1: gact action drop random type none pass val 0 index 6 ref 1 bind 1 installed 4 sec used 4 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 336 success 336) match 00000000/00000000 at 0 (success 336 ) action order 1: gact action pass random type none pass val 0 index 5 ref 1 bind 1 installed 38 sec used 4 sec Action statistics: Sent 24864 bytes 336 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... ..get filter 801::800 $TC -s filter get dev $ETH parent ffff: protocol ip \ handle 801:0:800 prio 2 u32 .... filter parent ffff: protocol ip pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10 (rule hit 260 success 130) match 00010000/00ff0000 at 8 (success 130 ) action order 1: gact action drop random type none pass val 0 index 6 ref 1 bind 1 installed 348 sec used 0 sec Action statistics: Sent 11440 bytes 130 pkt (dropped 130, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... ..get other one $TC -s filter get dev $ETH parent ffff: protocol ip \ handle 800:0:800 prio 2 u32 .... filter parent ffff: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 514 success 514) match 00000000/00000000 at 0 (success 514 ) action order 1: gact action pass random type none pass val 0 index 5 ref 1 bind 1 installed 506 sec used 4 sec Action statistics: Sent 35544 bytes 514 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... ..try something that doesnt exist $TC -s filter get dev $ETH parent ffff: protocol ip handle 800:0:803 prio 2 u32 ..... RTNETLINK answers: No such file or directory We have an error talking to the kernel ..... Note, added NLM_F_ECHO is for backward compatibility. old kernels never before Eric's patch will not respond without it and newer kernels (after Erics patch) will ignore it. In old kernels there is a side effect: In addition to a response to the GET you will receive an event (if you do tc mon). But this is still better than what it was before (not working at all). Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:14:47 -07:00
Stephen Hemminger	557b705445	tc: skbmod style cleanup break long lines	2016-10-12 15:12:51 -07:00
Jamal Hadi Salim	da65128998	actions: add skbmod action This action is intended to be an upgrade from a usability perspective from pedit (as well as operational debugability). Compare this: sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action pedit munge offset -14 u8 set 0x02 \ munge offset -13 u8 set 0x15 \ munge offset -12 u8 set 0x15 \ munge offset -11 u8 set 0x15 \ munge offset -10 u16 set 0x1515 \ pipe to: sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbmod dmac 02:15:15:15:15:15 Or worse, try to debug a policy with destination mac, source mac and etherype. Then make that a hundred rules and you'll get my point. The most important ethernet use case at the moment is when redirecting or mirroring packets to a remote machine. The dst mac address needs a re-write so that it doesn't get dropped or confuse an interconnecting (learning) switch or dropped by a target machine (which looks at the dst mac). In the future common use cases on pedit can be migrated to this action (as an example different fields in ip v4/6, transports like tcp/udp/sctp etc). For this first cut, this allows modifying basic ethernet header. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Craig Dillabaugh	883c6708e4	action gact: list pipe as a valid action Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Jamal Hadi Salim	8da6ff35cd	actions ife: Introduce encoding and decoding of tcindex metadata Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Roman Mashak	1b600f4b54	ife: improve help text Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Roman Mashak	57ee4430f9	ife: print prio, mark and hash as unsigned Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Roman Mashak	9a56cca3f3	ife action: allow specifying index in hex Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-10-12 15:09:52 -07:00
Eric Dumazet	39f8caeb96	tc: fq: display unthrottle latency In linux-4.9 fq packet scheduler got a new stat : unthrottle_latency in nano second units. Gives a good indication of system load or timer implementation latencies. Signed-off-by: Eric Dumazet <edumazet@google.com>	2016-10-09 19:15:13 -07:00
Shmulik Ladkani	4654173e90	tc: m_vlan: Add vlan modify action The 'vlan modify' action allows to replace an existing 802.1q tag according to user provided settings. It accepts same arguments as the 'vlan push' action. For example, this replaces vid 6 with vid 5: # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \ basic match 'meta(vlan mask 0xfff eq 6)' \ action vlan modify id 5 continue Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>	2016-10-09 19:11:34 -07:00
Stephen Hemminger	d54e3ab985	Merge branch 'master' into net-next	2016-10-09 18:53:52 -07:00
Sushma Sitaram	58d93d0030	tc: f_u32: Fill in 'linkid' provided by user Currently, 'linkid' input by the user is parsed but 'handle' is appended to the netlink message. # tc filter add dev enp1s0f1 protocol ip parent ffff: prio 99 u32 ht 800: \ order 1 link 1: offset at 0 mask 0f00 shift 6 plus 0 eat match ip \ protocol 6 ff resulted in: filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 match 00060000/00ff0000 at 8 offset 0f00>>6 at 0 eat This patch results in: filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 link 1: match 00060000/00ff0000 at 8 offset 0f00>>6 at 0 eat Signed-off-by Sushma Sitaram: Sushma Sitaram <sushma.sitaram@intel.com>	2016-10-09 18:51:00 -07:00
Stephen Hemminger	36923f4e69	Merge branch 'master' into net-next	2016-09-20 09:50:53 -07:00
Davide Caratti	087dec7fcf	tc: don't accept qdisc 'handle' greater than ffff since get_qdisc_handle() truncates the input value to 16 bit, return an error and prompt "invalid qdisc ID" in case input 'handle' parameter needs more than 16 bit to be stored. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc>	2016-09-20 09:44:59 -07:00
Stephen Hemminger	88ba11bc08	Merge branch 'master' into net-next	2016-09-01 09:11:10 -07:00
Stephen Hemminger	ae810982cc	remove useless return statement Get rid of: void foo() { ... return; }	2016-09-01 08:44:20 -07:00
Stephen Hemminger	98a2af1d40	Merge branch 'master' into net-next	2016-09-01 08:39:15 -07:00
Hadar Hen Zion	0e43ed9dea	tc: m_vlan: Add priority option to push vlan action The current vlan push action supports only vid and protocol options. Add priority option. Example script that adds vlan push action with vid and priority: tc filter add dev veth0 protocol ip parent ffff: \ flower \ indev veth0 \ action vlan push id 100 priority 5 Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com>	2016-09-01 08:38:41 -07:00
Hadar Hen Zion	745d917260	tc: flower: Introduce vlan support Classification according to vlan id and vlan priority. Example script that adds vlan filter: # add ingress qdisc tc qdisc add dev ens4f0 ingress # add a flower filter with vlan id and priority classification tc filter add dev ens4f0 protocol 802.1Q parent ffff: \ flower \ indev ens4f0 \ vlan_ethtype ipv4 \ vlan_id 100 \ vlan_prio 3 \ action vlan pop Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com>	2016-09-01 08:38:41 -07:00
Yotam Gigi	d5cbf3ff05	tc: Add support for the matchall traffic classifier. The matchall classifier matches every packet and allows the user to apply actions on it. In addition, it supports the skip_sw and skip_hw (as can be found on u32 and flower filter) that direct the kernel to skip the software/hardware processing of the actions. This filter is very useful in usecases where every packet should be matched. For example, packet mirroring (SPAN) can be setup very easily using that filter. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com>	2016-09-01 08:37:01 -07:00
Roman Mashak	3de88c4b47	police: improve usage message Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-08-29 10:54:40 -07:00
Roman Mashak	cef49e514a	police: add extra space to improve police result printing Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-08-29 10:54:40 -07:00
Jamal Hadi Salim	06be01f75d	tc classifiers: Modernize tcindex classifier Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-08-22 10:08:00 -07:00
WANG Cong	6fcf36c9c6	tc: fix a misleading failure Before this patch: # ./tc/tc actions add action drop index 11 RTNETLINK answers: File exists We have an error talking to the kernel Command "(null)" is unknown, try "tc actions help". After this patch: # ./tc/tc actions add action drop index 11 RTNETLINK answers: File exists We have an error talking to the kernel Cc: Stephen Hemminger <shemming@brocade.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>	2016-08-09 11:18:14 -07:00
Stephen Hemminger	1b2594935e	Merge branch 'master' into net-next	2016-08-08 08:57:22 -07:00
Phil Sutter	c15feb99a4	tc/m_gact: Fix action_a2n() return code check The function returns zero on success. Reported-by: Mark Bloch <markb@mellanox.com> Fixes: `69f5aff63c` ("tc: use action_a2n() everywhere") Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-08-08 08:52:47 -07:00
Stephen Hemminger	6d54c41580	Merge branch 'master' into net-next	2016-08-08 08:44:07 -07:00
Phil Sutter	9579afb24e	tc: Fix for missing estimator initialization When switching to C99 initializers, I forgot to add this one. This means that when trying to set an estimator value, tc would complain about spurious duplicate estimator parameter. But much worse, the random variable content is sent to the kernel regardless of whether an estimator was given or not. Fixes: `d17b136f7d` ("Use C99 style initializers everywhere") Reported-by: Stas Nichiporovich <stasn77@gmail.com> Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-08-06 10:14:06 -07:00
Stephen Hemminger	79f5bf17a5	Merge branch 'master' into net-next	2016-07-25 08:21:00 -07:00
Phil Sutter	7093200611	tc: util: No need for action_n2a() to be reentrant This allows to remove some buffers here and there. While at it, make it return a const value. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-07-25 08:10:43 -07:00
Phil Sutter	69f5aff63c	tc: use action_a2n() everywhere Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-07-25 08:10:43 -07:00
Phil Sutter	53aadc5286	tc: util: bore up action_a2n() It's a pitty this function is used nowhere, so let's polish it for use: * Loop over branch names, makes it clear that every former conditional was exactly identical. * Support 'pipe' branch name, too. * Make number parsing optional. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-07-25 08:10:43 -07:00
Phil Sutter	9ffc80b1e4	tc: Reformat tc_util.h * Drop 'extern' keyword before function declarations. * Add parameter names where they were missing for matters of consistency. * Drop fancy indenting (e.g. tab between type and name). * Break long lines to not exceed 80 columns. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-07-25 08:10:43 -07:00
Stephen Hemminger	ac75d5cd36	Merge branch 'master' into net-next	2016-07-20 12:21:42 -07:00
Phil Sutter	247ace6115	tc: ematch: Ignore all-zero mask value when printing filters The optional mask which may be added to int values is considered by the kernel only if it is non-zero, therefore tc should only then also print it. Without this, not passing a mask value like so: \| # tc filter add dev d0 parent 8001: \ \| basic match meta$vlan eq 1$ \ \| classid 8001:1 Would lead to tc printing an all-zero mask later: \| # tc filter show dev d0 \| filter parent 8001: protocol all pref 49151 basic \| filter parent 8001: protocol all pref 49151 basic handle 0x1 flowid 8001:1 \| meta(vlan mask 0x00000000 eq 1) This is obviously confusing as an all-zero mask strictly means to eliminate all bits from the value, but the opposite is the case. Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-07-20 12:20:13 -07:00
Phil Sutter	30a8842c49	No need to initialize rtattr fields before parsing Since parse_rtattr_flags() calls memset already, there is no need for callers to do so themselves. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: David Ahern <dsa@cumulusnetworks.com>	2016-07-20 12:05:24 -07:00
Phil Sutter	f89bb0210f	Replace malloc && memset by calloc This only replaces occurrences where the newly allocated memory is cleared completely afterwards, as in other cases it is a theoretical performance hit although code would be cleaner this way. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: David Ahern <dsa@cumulusnetworks.com>	2016-07-20 12:05:24 -07:00
Phil Sutter	d17b136f7d	Use C99 style initializers everywhere This big patch was compiled by vimgrepping for memset calls and changing to C99 initializer if applicable. One notable exception is the initialization of union bpf_attr in tc/tc_bpf.c: changing it would break for older gcc versions (at least <=3.4.6). Calls to memset for struct rtattr pointer fields for parse_rtattr*() were just dropped since they are not needed. The changes here allowed the compiler to discover some unused variables, so get rid of them, too. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: David Ahern <dsa@cumulusnetworks.com>	2016-07-20 12:05:24 -07:00
Phil Sutter	d892aaf740	tc: m_action: Improve conversion to C99 style initializers This improves my initial change in the following points: - Flatten embedded struct's initializers. - No need to initialize variables to zero as the key feature of C99 initializers is to do this implicitly. - By relocating the declaration of struct rtattr *tail, it can be initialized at the same time. Fixes: `a0a73b298a` ("tc: m_action: Use C99 style initializers for struct req") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: David Ahern <dsa@cumulusnetworks.com>	2016-07-20 12:05:24 -07:00
Daniel Borkmann	e77fa41d4c	bpf: also check elf for official e_machine value Use the official BPF ELF e_machine value that was assigned recently [1] and will be propagated to glibc, libelf et al. LLVM will switch to it in 3.9 release, therefore we need to prepare tc to check for EM_ELF as well, older version still have the EM_NONE. [1] `36b9c09330` Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2016-07-20 11:54:53 -07:00
Stephen Hemminger	d5b62e6439	Merge branch 'master' into net-next	2016-07-06 21:29:32 -07:00
Amir Vadai	cfcabf18d8	tc: flower: Add skip_{hw\|sw} support On devices that support TC flower offloads, these flags enable a filter to be added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive flags. By default without any flags, the filter is added to both HW and SW, but no error checks are done in case of failure to add to HW. With skip-sw, failure to add to HW is treated as an error. Here is a sample script that adds 2 filters, one with skip_sw and the other with skip_hw flag. # add ingress qdisc tc qdisc add dev enp0s9 ingress # enable hw tc offload. ethtool -K enp0s9 hw-tc-offload on # add a flower filter with skip-sw flag. tc filter add dev enp0s9 protocol ip parent ffff: flower \ ip_proto 1 indev enp0s9 skip_sw \ action drop # add a flower filter with skip-hw flag. tc filter add dev enp0s9 protocol ip parent ffff: flower \ ip_proto 3 indev enp0s9 skip_hw \ action drop Signed-off-by: Amir Vadai <amirva@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com>	2016-07-06 21:24:48 -07:00
Jamal Hadi Salim	1d1e0fd29b	actions: skbedit add support for mod-ing skb pkt_type I'll make a formal submission sans the header when the kernel patches makes it in. This version is for someone who wants to play around with the net-next kernel patches i sent Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-07-06 21:15:44 -07:00
Phil Sutter	5f6a467f59	tc: m_action: Drop unused variable nladdr in tc_action_gd() This has been there since the introduction of tc/m_action.c back in 2004 and was apparently never in use. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-16 09:41:55 -07:00
Phil Sutter	a0a73b298a	tc: m_action: Use C99 style initializers for struct req Instead of initializing fields after (or sometimes even before) zeroing the whole struct via memset(), initialize the whole thing at declaration time. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-16 09:41:55 -07:00
Alexander Aring	9b32f89693	tc: let m_ipt work with new iptables API headers Since commit `5cd1adb` ("Update to current iptables headers") the build with m_ipt.o and the following config will fail: TC_CONFIG_XT:=n TC_CONFIG_XT_OLD:=n TC_CONFIG_XT_OLD_H:=n This patch renames "iptables_target" to "xtables_target" and some other things which gets renamed and I noticed while reading iptables git log. Functions which are not used in m_ipt.c and not exported by the header are removed, if they still used in m_ipt.c I added a static to the function. Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Alexander Aring <aar@pengutronix.de>	2016-06-14 18:03:30 -07:00
Stephen Hemminger	4b83a08c28	m_xt: whitespace cleanup Make it 99% checkpatch clean.	2016-06-14 14:40:53 -07:00
Phil Sutter	2ef4008585	tc: m_xt: Introduce get_xtables_target_opts() This pulls common code from parse_ipt() and print_ipt() functions together. While here, also fix for incorrect use of the global 'optarg' variable in print_ipt(). Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	f6ddd9c5da	tc: m_xt: Simplify argc adjusting in parse_ipt() And while at it, also improve the error message in case too few parameters have been given. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	28432f370e	tc: m_xt: Get rid of iargc variable in parse_ipt() After dropping the unused decrement of argc in the function's tail, it can fully take over what iargc has been used for. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	ab8f52fc4a	tc: m_xt: Get rid of rargc in parse_ipt() No need to copy the passed parameter, it's changed only once right before function return. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	b0ba018576	tc: m_xt: Drop unused variable fw in parse_ipt() Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	b45f9141c2	tc: m_xt: Get rid of one indentation level in parse_ipt() Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	f1a7c7d830	tc: m_xt: Fix indenting By exiting early if xtables_find_target() fails, one indenting level can be dropped. Some of the wrongly indented code then happens to sit at the right spot by accident which is why this patch is smaller than expected. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	8eee75a835	tc: m_xt: Fix segfault when adding multiple actions at once Without this, the following call to tc would segfault: \| tc filter add dev d0 parent ffff: u32 match u32 0 0 \ \| action xt -j MARK --set-mark 0x1 \ \| action xt -j MARK --set-mark 0x1 The reason is basically the same as for `6e2e5ec28b` ("fix print_ipt: segfault if more then one filter with action -j MARK.") but in parse_ipt() instead of print_ipt(). Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Phil Sutter	445745221a	tc: m_xt: Prevent segfault with standard targets Iptables standard targets like DROP or REJECT don't implement the print callback in libxtables. Hence the following command would segfault: \| tc filter add dev d0 parent ffff: u32 match u32 0 0 action xt -j DROP With this patch standard targets still can't be used (and are not really useful anyway), but at least it doesn't crash anymore. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-06-14 14:35:56 -07:00
Stephen Hemminger	8b625177ba	pedit: fix whitespace etc Minor changes from checkpatch	2016-06-14 14:32:27 -07:00
Jamal Hadi Salim	d8694a30a4	action pedit: stylistic changes More modern layout. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-06-14 14:29:20 -07:00
Stephen Hemminger	622812052a	tc: f_u32 cleanup indentation and long lines Several long lines and too long messages here.	2016-06-08 16:45:26 -07:00
Samudrala, Sridhar	5e5b3008d1	tc: f_u32: Add support for skip_hw and skip_sw flags On devices that support TC U32 offloads, these flags enable a filter to be added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive flags. By default without any flags, the filter is added to both HW and SW, but no error checks are done in case of failure to add to HW. With skip-sw, failure to add to HW is treated as an error. Here is a sample script that adds 2 filters, one with skip_sw and the other with skip_hw flag. # add ingress qdisc tc qdisc add dev p4p1 ingress # enable hw tc offload. ethtool -K p4p1 hw-tc-offload on # add u32 filter with skip-sw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:1 u32 ht 800: flowid 800:1 \ skip-sw \ match ip src 192.168.1.0/24 \ action drop # add u32 filter with skip-hw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:2 u32 ht 800: flowid 800:2 \ skip-hw \ match ip src 192.168.2.0/24 \ action drop Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>	2016-06-08 16:39:30 -07:00
Sabrina Dubroca	9f7401fa49	utils: add get_be{16, 32, 64}, use them where possible Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Phil Sutter <phil@nwl.cc>	2016-06-08 09:30:37 -07:00
Eric Dumazet	4de4b5ca14	fq_codel: add per queue memory limit This patch adds support for TCA_FQ_CODEL_MEMORY_LIMIT attribute. .. qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0 requeues 21705223) rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414 ecn_mark 0 memory_used 4190048 drop_overmemory 4994406 new_flows_len 1 old_flows_len 177 Signed-off-by: Eric Dumazet <edumazet@google.com>	2016-06-08 08:42:00 -07:00
Jamal Hadi Salim	ead954cbd4	tc action policer: enable timestamp display Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-31 13:03:13 -07:00
Jamal Hadi Salim	82e6efe2e3	tc filter u32: Coding style fixes "handle" was being used several times for different things. Fix the 80 character limit abuse and other little issues while at it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-31 12:33:48 -07:00
Stephen Hemminger	e6263c8583	tc: action result is u32 In kernel action result is u32 not int in netlink messages.	2016-05-31 12:22:45 -07:00
Jamal Hadi Salim	45c6837911	tc action policer: Avoid nonsensical input The user must at least specify a choice of the token bucket or ewma policing or late binding index. TB policing requires at minimal a rate and burst. In addition fix formatting issues (80 chars etc). Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-31 12:16:45 -07:00
David Ahern	57bdf8b764	Make builds default to quiet mode Similar to the Linux kernel and perf add infrastructure to reduce the amount of output tossed to a user during a build. Full build output can be obtained with 'make V=1' Builds go from: make[1]: Leaving directory `/home/dsa/iproute2.git/lib' make[1]: Entering directory `/home/dsa/iproute2.git/ip' gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -c -o ip.o ip.c gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -c -o ipaddress.o ipaddress.c to: ... AR libutil.a ip CC ip.o CC ipaddress.o ... Signed-off-by: David Ahern <dsa@cumulusnetworks.com>	2016-05-31 12:13:07 -07:00
Jamal Hadi Salim	e70b9f16ea	tc simple action: bug fix Failed compile m_simple.c: In function ‘parse_simple’: m_simple.c:154:6: warning: too many arguments for format [-Wformat-extra-args] *argv); ^ m_simple.c:103:14: warning: unused variable ‘maybe_bind’ [-Wunused-variable] Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-31 12:11:52 -07:00
Jamal Hadi Salim	a78a2dba27	tc fix ife late binding following late binding didn't work sudo tc actions add action ife encode \ type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-23 16:15:31 -07:00
Daniel Borkmann	1a0320727c	f_bpf: fix filling of handle when no further arg is provided We need to fill handle when provided by the user, even if no further argument is provided. Thus, move the test for arg to the correct location, so that it works correctly: # tc filter show dev foo egress filter protocol all pref 1 bpf filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action filter protocol all pref 1 bpf handle 0x2 bpf.o:[classifier] direct-action # tc filter del dev foo egress prio 1 handle 2 bpf # tc filter show dev foo egress filter protocol all pref 1 bpf filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-05-23 16:14:18 -07:00
Daniel Borkmann	a2de651e64	ingress, clsact: don't add TCA_OPTIONS to nl msg In ingress and clsact qdisc TCA_OPTIONS are ignored, since it's parameterless. In tc, we add an empty addattr_l(... TCA_OPTIONS, NULL, 0) to the netlink message nevertheless. This has the side effect that when someone tries a 'tc qdisc replace' and already an existing such qdisc is present, tc fails with EINVAL here. Reason is that in the kernel, this invokes qdisc_change() when such requested qdisc is already present. When TCA_OPTIONS are passed to modify parameters, it looks whether qdisc implements .change() callback, and if not present (like in both cases here) it returns with error. Rather than adding an empty stub to the kernel that ignores TCA_OPTIONS again, just don't add TCA_OPTIONS to the netlink message in the first place. Before: # tc qdisc replace dev foo clsact # first try # tc qdisc replace dev foo clsact # second one RTNETLINK answers: Invalid argument After: # tc qdisc replace dev foo clsact # tc qdisc replace dev foo clsact # tc qdisc replace dev foo clsact Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-05-16 11:20:50 -07:00
Jamal Hadi Salim	fdf1bdd0f1	tc simple action update and breakage Brings it closer to more serious actions (adding branching and allowing for late binding) Unfortunately this breaks old syntax of the simple action. But because simple is a pedagogical example unlikely to be used in production environments (i.e its role is to serve as an example on how to write actions), then this is ok. New syntax for simple has new keyword "sdata". Example usage is: sudo tc actions add action simple sdata "foobar" index 1 or tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action simple sdata "foobar" Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-16 11:15:12 -07:00
Jamal Hadi Salim	43726b750a	tc: don't ignore ok as an action branch This is what used to happen before: tc filter add dev tap1 parent ffff: protocol 0xfefe prio 10 \ u32 match u32 0 0 flowid 1:16 \ action ife decode allow mark ok tc -s filter ls dev tap1 parent ffff: filter protocol [65278] pref 10 u32 filter protocol [65278] pref 10 u32 fh 800: ht divisor 1 filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 match 00000000/00000000 at 0 action order 1: ife decode action pipe index 2 ref 1 bind 1 installed 4 sec used 4 sec type: 0x0 Metadata: allow mark Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: gact action pass random type none pass val 0 index 1 ref 1 bind 1 installed 4 sec used 4 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Note the extra action added at the end.. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-16 11:13:58 -07:00
Jamal Hadi Salim	d3e511223f	tc: introduce IFE action This action allows for a sending side to encapsulate arbitrary metadata which is decapsulated by the receiving end. The sender runs in encoding mode and the receiver in decode mode. Both sender and receiver must specify the same ethertype. At some point we hope to have a registered ethertype and we'll then provide a default so the user doesnt have to specify it. For now we enforce the user specify it. Described in netdev01 paper: "Distributing Linux Traffic Control Classifier-Action Subsystem" Authors: Jamal Hadi Salim and Damascene M. Joachimpillai Also refer to IETF draft-ietf-forces-interfelfb-04.txt Lets show example usage where we encode icmp from a sender towards a receiver with an skbmark of 17; both sender and receiver use ethertype of 0xdead to interop. YYYY: Lets start with Receiver-side policy config: xxx: add an ingress qdisc sudo tc qdisc add dev $ETH ingress xxx: any packets with ethertype 0xdead will be subjected to ife decoding xxx: we then restart the classification so we can match on icmp at prio 3 sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \ u32 match u32 0 0 flowid 1:1 \ action ife decode reclassify xxx: on restarting the classification from above if it was an icmp xxx: packet, then match it here and continue to the next rule at prio 4 xxx: which will match based on skb mark of 17 sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \ u32 match ip protocol 1 0xff flowid 1:1 \ action continue xxx: match on skbmark of 0x11 (decimal 17) and accept sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \ handle 0x11 fw flowid 1:1 \ action ok xxx: Lets show the decoding policy sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead xxx: filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0) match 00000000/00000000 at 0 (success 0 ) action order 1: ife decode action reclassify type 0x0 allow mark allow prio index 11 ref 1 bind 1 installed 45 sec used 45 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Observe that above lists all metadatum it can decode. Typically these submodules will already be compiled into a monolithic kernel or loaded as modules YYYY: Lets show the sender side now .. xxx: Add an egress qdisc on the sender netdev sudo tc qdisc add dev $ETH root handle 1: prio xxx: xxx: Match all icmp packets to 192.168.122.237/24, then xxx: tag the packet with skb mark of decimal 17, then xxx: Encode it with: xxx: ethertype 0xdead xxx: add skb->mark to whitelist of metadatum to send xxx: rewrite target dst MAC address to 02:15:15:15:15:15 xxx: sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 192.168.122.237/24 \ match ip protocol 1 0xff \ flowid 1:2 \ action skbedit mark 17 \ action ife encode \ type 0xDEAD \ allow mark \ dst 02:15:15:15:15:15 xxx: Lets show the encoding policy filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 118 success 0) match c0a87a00/ffffff00 at 16 (success 0 ) match 00010000/00ff0000 at 8 (success 0 ) action order 1: skbedit mark 17 index 11 ref 1 bind 1 installed 3 sec used 3 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: ife encode action pipe type 0xDEAD allow mark dst 02:15:15:15:15:15 index 12 ref 1 bind 1 installed 3 sec used 3 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Now test by sending ping from sender to destination Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-05-16 11:13:26 -07:00
Gustavo Zacarias	5c5a0f3df9	iproute2: tc_bpf.c: fix building with musl libc We need limits.h for PATH_MAX, fixes: tc_bpf.c: In function ‘bpf_map_selfcheck_pinned’: tc_bpf.c:222:12: error: ‘PATH_MAX’ undeclared (first use in this function) char file[PATH_MAX], buff[4096]; Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar> Acked-by: Daniel Borkmann <daniel@iogearbox.net>	2016-04-11 22:09:57 +00:00
Daniel Borkmann	4dd3f50af4	tc, bpf: add support for map pre/allocation Follow-up to kernel commit 6c9059817432 ("bpf: pre-allocate hash map elements"). Add flags support, so that we can pass in BPF_F_NO_PREALLOC flag for disallowing preallocation. Update examples accordingly and also remove the BPF_* map helper macros from them as they were not very useful. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-04-11 21:54:47 +00:00
Daniel Borkmann	afc1a2000b	tc, bpf: further improve error reporting Make it easier to spot issues when loading the object file fails. This includes reporting in what pinned object specs differ, better indication when we've reached instruction limits. Don't retry to load a non relo program once we failed with bpf(2), and report out of bounds tail call key. Also, add truncation of huge log outputs by default. Sometimes errors are quite easy to spot by only looking at the tail of the verifier log, but logs can get huge in size e.g. up to few MB (due to verifier checking all possible program paths). Thus, by default limit output to the last 4096 bytes and indicate that it's truncated. For the full log, the verbose option can be used. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-04-11 21:53:58 +00:00
Jiri Pirko	4952b45946	include: add linked list implementation from kernel Rename hlist.h to list.h while adding it to be aligned with kernel Signed-off-by: Jiri Pirko <jiri@mellanox.com>	2016-03-27 10:56:11 -07:00
Stephen Hemminger	e9e9365b56	scrub out whitespace issues Run script that removes trailing whitespace everywhere.	2016-03-27 10:50:14 -07:00
Phil Sutter	7faf1588a7	lib/utils: introduce rt_addr_n2a_rta() This simple macro eases calling rt_addr_n2a() with data from an rt_attr pointer. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-27 10:37:35 -07:00
Phil Sutter	2e96d2ccd0	utils: make rt_addr_n2a() non-reentrant by default There is only a single user who needs it to be reentrant (not really, but it's safer like this), add rt_addr_n2a_r() for it to use. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-27 10:37:34 -07:00
Phil Sutter	a418e45164	make format_host non-reentrant by default There are only three users which require it to be reentrant, the rest is fine without. Instead, provide a reentrant format_host_r() for users which need it. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-27 10:37:34 -07:00
Phil Sutter	51011dac36	tc/m_vlan.c: mention CONTROL option in help text Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2016-03-27 10:34:48 -07:00
Phil Sutter	1672f42195	tc: connmark, pedit: Rename BRANCH to CONTROL As Jamal suggested, BRANCH is the wrong name, as these keywords go beyond simple branch control - e.g. loops are possible, too. Therefore rename the non-terminal to CONTROL instead which should be more appropriate. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2016-03-27 10:34:42 -07:00
Phil Sutter	a33786b582	tc: pedit: Fix raw op The retain value was wrong for u16 and u8 types. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2016-03-27 10:34:36 -07:00
Phil Sutter	77bed404d0	tc: pedit: Fix for big-endian systems This was tricky to get right: - The 'stride' value used for 8 and 16 bit values must behave inverse to the value's intra word offset to work correctly with big-endian data act_pedit is editing. - The 'm' array's values are in host byte order, so they have to be converted as well (and the ordering was just inverse, for some reason). - The only sane way of getting this right is to manipulate value/mask in host byte order and convert the output. - TIPV4 (i.e. 'munge ip src/dst') had it's own pitfall: the address parser converts to network byte order automatically. This patch fixes this by converting it back before calling pack_key32, which is a hack but at least does not require to implement a completely separate code flow. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2016-03-27 10:34:33 -07:00
Phil Sutter	952f89deba	tc/p_ip.c: Minor coding style cleanup Break overlong function definitions and remove one extraneous whitespace. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2016-03-27 10:34:22 -07:00
Stephen Hemminger	32a121cba2	tc: code cleanup Use checkpatch to fix whitespace and other style issues.	2016-03-21 11:48:36 -07:00
Luca Lemmo	4733b18a5e	tc: q_{codel,fq_codel}: add missing space in help text Signed-off-by: Luca Lemmo <luca@linux.com>	2016-03-21 11:42:13 -07:00
Luca Lemmo	725f2a872d	tc: f_u32: trivial coding style cleanups Signed-off-by: Luca Lemmo <luca@linux.com>	2016-03-21 11:42:12 -07:00
Luca Lemmo	dd0c8d193f	tc: f_u32: add missing spaces around operators Signed-off-by: Luca Lemmo <luca@linux.com>	2016-03-21 11:42:12 -07:00
Phil Sutter	338b003bcc	tc: pedit: Fix retain value for ihl adjustments Since the IP Header Length field is just half a byte, adjust retain to only match these bits so the Version field is not overwritten by accident. The whole concept is actually broken due to dependency on endianness which pedit ignores. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-06 12:53:11 -08:00
Phil Sutter	f440e9d8c2	tc: pedit: Fix parse_cmd() This was horribly broken: * pack_key8() and pack_key16() ... * missed to invert retain value when applying it to the mask, * did not sanitize val by ANDing it with retain, * and ignored the mask which is necessary for 'invert' command. * pack_key16() did not convert mask to network byte order. * Changing the retain value for 'invert' or 'retain' operation seems just plain wrong. * While here, also got rid of unnecessary offset sanitization in pack_key32(). * Simplify code a bit by always assigning the local mask variable to tkey->mask before calling any of the pack_key*() variants. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-06 12:53:11 -08:00
Phil Sutter	ec0ceeec49	tc: pedit: Fix layered op parsing After lookup of the layered op submodule, pedit would pass argv and argc including the layered op identifier at first position which confused the submodule parser. Fix this by calling NEXT_ARG() before calling the parse_peopt() callback. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-06 12:53:11 -08:00
Phil Sutter	c024acc641	tc: pedit: document branch control in help output This seems to have been a hidden feature, though it's very useful and necessary at least when combining multiple pedit actions. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-03-04 15:27:52 -08:00
Dmitrii Shcherbakov	467f9fce60	htb: rename b4 buffer to b3 to make its name more consistent b3 buffer has been deleted previously so b2 is followed by b4 which is not consistent. Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc>	2016-02-17 17:50:14 -08:00
Dmitrii Shcherbakov	1aea7fea26	htb: remove printing of a deprecated overhead value Remove printing according to the previously used encoding of mpu and overhead values within the tc_ratespec's mpu field. This encoding is no longer being used as a separate 'overhead' field in the ratespec structure has been introduced. Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc>	2016-02-17 17:49:47 -08:00
Daniel Borkmann	5230a2ede0	tc, bpf: use bind/type macros from gelf Don't reimplement them and rather use the macros from the gelf header, that is, GELF_ST_BIND()/GELF_ST_TYPE(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-02-07 11:27:38 -08:00
Daniel Borkmann	a576c6b977	tc, bpf: give some more hints wrt false relos Provide some more hints to the user/developer when relos have been found that don't point to ld64 imm instruction. Ran couple of times into relos generated by clang [1], where the compiler tried to uninline inlined functions with eBPF and emitted BPF_JMP \| BPF_CALL opcodes. If this seems the case, give a hint that the user should do a work-around to use always_inline annotation. [1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-02-07 11:27:38 -08:00
Daniel Borkmann	f31645d138	tc, bpf: improve verifier logging With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in size, it happens rather quickly that bpf(2) rejects also valid programs when only the verifier log buffer size we have in tc is too small. Change that, so by default we don't do any logging, and only in error case we retry with logging enabled. If we should fail providing a reasonable dump of the verifier analysis, retry few times with a larger log buffer so that we can at least give the user a chance to debug the program. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com>	2016-02-07 11:27:38 -08:00
Nicolas Dichtel	67584e3ab2	tc: fix compilation with old gcc (< 4.6) (bis) Commit `8f80d450c3` ("tc: fix compilation with old gcc (< 4.6)") was reverted to ease the merge of the net-next branch. Here is the new version. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-02-05 11:46:18 +11:00
Daniel Borkmann	2486337aac	tc, bpf: make sure relo is in relation with map section Add a test that symbol from relocation entry is actually related to map section and bail out with an error message if it's not the case; in relation to [1]. [1] https://llvm.org/bugs/show_bug.cgi?id=26243 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2016-02-02 16:04:11 +11:00
Stephen Hemminger	62392ecbbb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2	2016-02-02 15:57:23 +11:00
Daniel Borkmann	8187b01273	tc, bpf: more header checks on loading elf eBPF llvm backend can support different BPF formats, make sure the object we're trying to load matches with regards to endiannes and while at it, also check for other attributes related to BPF ELFs. # llc --version LLVM (http://llvm.org/): LLVM version 3.8.0svn Optimized build. Built Jan 9 2016 (02:08:10). Default target: x86_64-unknown-linux-gnu Host CPU: ivybridge Registered Targets: bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2016-01-18 11:41:27 -08:00
Daniel Borkmann	cce3d4664c	tc, bpf: check section names and type everywhere When extracting sections, we better check for name and type. Noticed that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre 3.7), while more recent ones only seem to emit .strtab. Thus, make sure we get the right sections. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2016-01-18 11:41:27 -08:00
Daniel Borkmann	8f9afdd531	tc, clsact: add clsact frontend Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add clsact qdisc"). Quoting example usage from that commit description: Example, adding qdisc: # tc qdisc add dev foo clsact # tc qdisc show dev foo qdisc mq 0: root qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc clsact ffff: parent ffff:fff1 Adding filters (deleting, etc works analogous by specifying ingress/egress): # tc filter add dev foo ingress bpf da obj bar.o sec ingress # tc filter add dev foo egress bpf da obj bar.o sec egress # tc filter show dev foo ingress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action # tc filter show dev foo egress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action The ingress parent alias can also be used with ingress qdisc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-01-18 11:41:27 -08:00
Daniel Borkmann	0d45c4b420	tc, ingress: clean up ingress handling a bit Clean it up a bit, we can also get rid of some ugly ifdefs as in our case TC_H_INGRESS is always defined. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2016-01-18 11:41:27 -08:00
Stephen Hemminger	2505780c20	Merge branch 'net-next'	2016-01-18 09:37:45 -08:00
Stephen Hemminger	bc223ab861	Revert "tc: fix compilation with old gcc (< 4.6)" This reverts commit `8f80d450c3`.	2016-01-18 09:37:38 -08:00
Jamal Hadi Salim	488b41d020	tc: flower no need to specify the ethertype since all tc classifiers are required to specify ethertype as part of grammar By not allowing eth_type to be specified we remove contradiction for example when a user specifies: tc filter add ... priority xxx protocol ip flower eth_type ipv6 This patch removes that contradiction Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2016-01-11 08:24:01 -08:00
Julien Floret	8f80d450c3	tc: fix compilation with old gcc (< 4.6) gcc < 4.6 does not handle C11 syntax for the static initialization of anonymous struct/union, hence the following error: tc_bpf.c:260: error: unknown field map_type specified in initializer Signed-off-by: Julien Floret <julien.floret@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net>	2016-01-11 08:23:36 -08:00
Phil Sutter	de7db5d857	tc: m_connmark: Fix help text When specifying a conntrack zone, the 'zone' keyword has to be used before the actual zone index. Signed-off-by: Phil Sutter <phil@nwl.cc>	2016-01-07 10:35:08 -08:00
Stephen Hemminger	e49b51d663	monitor: fix file handle leak In some cases passing file to monitor left file open. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2015-12-30 17:26:38 -08:00
Daniel Borkmann	fd7f9c7fd1	bpf: minor fix in api and bpf_dump_error() usage Fix a whitespace in bpf_dump_error() usage, and also a missing closing bracket in ntohl() macro for eBPF programs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-12-17 17:22:25 -08:00
Daniel Borkmann	91d88eeb10	{f,m}_bpf: allow updates on program arrays Since we have all infrastructure in place now, allow atomic live updates on program arrays. This can be very useful e.g. in case programs that are being tail-called need to be replaced, f.e. when classifier functionality needs to be changed, new protocols added/removed during runtime, etc. Thus, provide a way for in-place code updates, minimal example: Given is an object file cls.o that contains the entry point in section 'classifier', has a globally pinned program array 'jmp' with 2 slots and id of 0, and two tail called programs under section '0/0' (prog array key 0) and '0/1' (prog array key 1), the section encoding for the loader is <id/key>. Adding the filter loads everything into cls_bpf: tc filter add dev foo parent ffff: bpf da obj cls.o Now, the program under section '0/1' needs to be replaced with an updated version that resides in the same section (also full path to tc's subfolder of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp): tc exec bpf graft m:globals/jmp obj cls.o sec 0/1 In case the program resides under a different section 'foo', it can also be injected into the program array like: tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo If the new tail called classifier program is already available as a pinned object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected into the prog array like: tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser In the kernel, the program on key 1 is being atomically replaced and the old one's refcount dropped. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2015-11-29 11:55:16 -08:00
Daniel Borkmann	f6793eec46	{f, m}_bpf: allow for user-defined object pinnings The recently introduced object pinning can be further extended in order to allow sharing maps beyond tc namespace. F.e. maps that are being pinned from tracing side, can be accessed through this facility as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2015-11-29 11:55:16 -08:00
Daniel Borkmann	9e607f2e72	{f, m}_bpf: check map attributes when fetching as pinned Make use of the new show_fdinfo() facility and verify that when a pinned map is being fetched that its basic attributes are the same as the map we declared from the ELF file. I.e. when placed into the globalns, collisions could occur. In such a case warn the user and bail out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2015-11-29 11:55:16 -08:00
Daniel Borkmann	910b543dcc	{f,m}_bpf: make tail calls working Now that we have the possibility of sharing maps, it's time we get the ELF loader fully working with regards to tail calls. Since program array maps are pinned, we can keep them finally alive. I've noticed two bugs that are being fixed in bpf_fill_prog_arrays() with this patch. Example code comes as follow-up. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2015-11-29 11:55:16 -08:00
Daniel Borkmann	32e93fb7f6	{f,m}_bpf: allow for sharing maps This larger work addresses one of the bigger remaining issues on tc's eBPF frontend, that is, to allow for persistent file descriptors. Whenever tc parses the ELF object, extracts and loads maps into the kernel, these file descriptors will be out of reach after the tc instance exits. Meaning, for simple (unnested) programs which contain one or multiple maps, the kernel holds a reference, and they will live on inside the kernel until the program holding them is unloaded, but they will be out of reach for user space, even worse with (also multiple nested) tail calls. For this issue, we introduced the concept of an agent that can receive the set of file descriptors from the tc instance creating them, in order to be able to further inspect/update map data for a specific use case. However, while that is more tied towards specific applications, it still doesn't easily allow for sharing maps accross multiple tc instances and would require a daemon to be running in the background. F.e. when a map should be shared by two eBPF programs, one attached to ingress, one to egress, this currently doesn't work with the tc frontend. This work solves exactly that, i.e. if requested, maps can now be _arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within a single object (but various program sections, PIN_OBJECT_NS) without "loosing" the file descriptor set. To make that happen, we use eBPF object pinning introduced in kernel commit b2197755b263 ("bpf: add support for persistent maps/progs") for exactly this purpose. The shipped examples/bpf/bpf_shared.c code from this patch can be easily applied, for instance, as: - classifier-classifier shared: tc filter add dev foo parent 1: bpf obj shared.o sec egress tc filter add dev foo parent ffff: bpf obj shared.o sec ingress - classifier-action shared (here: late binding to a dummy classifier): tc actions add action bpf obj shared.o sec egress pass index 42 tc filter add dev foo parent ffff: bpf obj shared.o sec ingress tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \ action bpf index 42 The toy example increments a shared counter on egress and dumps its value on ingress (if no sharing (PIN_NONE) would have been chosen, map value is 0, of course, due to the two map instances being created): [...] <idle>-0 [002] ..s. 38264.788234: : map val: 4 <idle>-0 [002] ..s. 38264.788919: : map val: 4 <idle>-0 [002] ..s. 38264.789599: : map val: 5 [...] ... thus if both sections reference the pinned map(s) in question, tc will take care of fetching the appropriate file descriptor. The patch has been tested extensively on both, classifier and action sides. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-11-23 16:10:44 -08:00
Stephen Hemminger	037660b351	qfq: fix parse_opt dead code Fix Coverity warning from dead code.	2015-10-27 15:46:20 +09:00
Stephen Hemminger	86c392f958	Merge branch 'master' into net-next	2015-10-23 15:46:08 -07:00
Stephen Hemminger	753ef5bbd6	tc: remove extra whitespace No blank lines at EOF, or trailing whitespace.	2015-10-23 15:43:28 -07:00
Phil Sutter	40eb737ebb	tc: u32 filter coding style cleanup Add missing spaces around operators to increase readability. Aside from that, make "preference" match a real synonym for "tos" and "dsfield" as it's effect was identical to them. Signed-off-by: Phil Sutter <phil@nwl.cc>	2015-10-23 15:37:26 -07:00
Phil Sutter	0a83e1eaf7	tc: improve filter help texts a bit This fixes a few syntax errors and changes route filter help text to use classid instead of flowid to be consistent with other filters' help texts. Signed-off-by: Phil Sutter <phil@nwl.cc>	2015-10-23 15:37:26 -07:00
Daniel Borkmann	343dc90854	m_bpf: don't require default opcode on ebpf actions After the patch, the most minimal command to load an eBPF action for late binding with auto index selection through tc is: tc actions add action bpf obj prog.o We already set TC_ACT_PIPE in tc as default opcode, so if nothing further has been specified, just use it. Also, allow "ok" next to "pass" for matching cmdline on TC_ACT_OK. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-10-12 09:44:52 -07:00
Daniel Borkmann	faa8a46300	f_bpf: allow for optional classid and add flags When having optional classid, most minimal command can be sth like: tc filter add dev foo parent X: bpf obj prog.o Therefore, adapt the code so that a next argument will not be enforced as the case currently. Also, minor cleanup on the classid, where we should rather have used addattr32(), and add flags for exec configuration, for example (using short notation): tc filter add dev foo parent X: bpf da obj prog.o Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com>	2015-10-12 09:41:05 -07:00
Stephen Hemminger	8fe9839857	fq: fix whitespace	2015-09-25 12:40:00 -07:00
Eric Dumazet	8d5bd8c302	tc: fq: allow setting and retrieving orphan_mask linux-3.19 fq packet scheduler got a new attribute, controlling number of 'flows' holding packets not attached to a socket (forwarding usage) kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7 ("pkt_sched: fq: better control of DDOS traffic") This patch adds corresponding code to tc command. tc qd replace dev eth0 root fq orphan_mask 511 Signed-off-by: Eric Dumazet <edumazet@google.com>	2015-09-25 12:37:09 -07:00
Eric Dumazet	32a6fbe563	tc : add timestamps to tc monitor Support -timestamp and -tshort options for tc monitor like ip monitor. # tc -tshort monitor [2015-09-23T16:39:11.260555] qdisc fq 8003: dev eth0 root refcnt 2 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 refill_delay 40.0ms Signed-off-by: Eric Dumazet <edumazet@google.com>	2015-09-25 12:35:46 -07:00
Phil Sutter	565af7b816	tc: fq: allow setting and retrieving flow refill delay Code to parse and export this tuneable via netlink is already present in sched_fq.c of the kernel, so not making it accessible for users would be a waste of resources. Signed-off-by: Phil Sutter <phil@nwl.cc>	2015-09-23 16:02:13 -07:00
Phil Sutter	5c32fa1d69	comment: Fix remaining listings of wrong FSF address This patch follows the changes of commit `4d98ab0` ("Fix FSF address in file headers"), fixing file headers added after it. Signed-off-by: Phil Sutter <phil@nwl.cc>	2015-09-23 15:58:54 -07:00
Stephen Hemminger	9a6422c243	Merge branch 'master' into net-next	2015-08-13 19:42:41 -07:00
Stephen Hemminger	bcb4a7aa5b	tc: fix return after invarg	2015-08-13 14:20:40 -07:00
Daniel Borkmann	baed90842a	m_bpf: add frontend support for late binding Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly support late binding of bpf action to a classifier"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-08-10 11:19:11 -07:00
Nicolas Dichtel	611f70b287	tc: fix bpf compilation with old glibc Error was: f_bpf.o: In function `bpf_parse_opt': f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv' m_bpf.o: In function `parse_bpf': m_bpf.c:(.text+0x587): undefined reference to `secure_getenv' collect2: error: ld returned 1 exit status There is no special reason to use the secure version of getenv, thus let's simply use getenv(). CC: Daniel Borkmann <daniel@iogearbox.net> Fixes: `88eea53954` ("tc: {f,m}_bpf: allow to retrieve uds path from env") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Yegor Yefremov <yegorslists@googlemail.com>	2015-07-27 14:35:42 -07:00
Stephen Hemminger	69be46c562	Merge branch 'master' into net-next	2015-06-26 00:04:04 -04:00
Daniel Borkmann	88eea53954	tc: {f,m}_bpf: allow to retrieve uds path from env Allow to retrieve uds path from the environment, facilitates also dealing with export a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-06-25 15:13:16 -04:00
Daniel Borkmann	473d7840c3	tc: {f,m}_bpf: add tail call support for parser Kernel commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") added support for tail calls, this patch here adds tc front end parts for the object parser to prepopulate a given eBPF prog array before the root prog is pushed down for classifier creation. The prepopulation works with any number of prog arrays in any dependencies, e.g. prog or normal maps could also be used from progs that are tail-called themself, etc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-06-25 15:13:16 -04:00
Maciej Żenczykowski	0bbca0422f	iproute2: tc/m_pedit.c - remove dead code The initializers are simply not needed. These if-blocks are outright dead code, because '0 > unsigned' is always false, so only else clause triggers and regardless of which clause triggers it only updates 'ind' which is later unconditionally written to before being used anyway. Otherwise we get errors from clang: m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare] if (0 > tkey->off) { ~ ^ ~~~~~~~~~ m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare] if (0 > tkey->off) { ~ ^ ~~~~~~~~~ 2 errors generated. Change-Id: I3c9e9092915088fc56f992e5df736851541a4458	2015-06-25 08:52:06 -04:00
Stephen Hemminger	f975059a51	Merge branch 'master' into net-next	2015-06-25 08:01:51 -04:00
Daniel Borkmann	ad1fe0d8e9	tc: util: fix print_rate for ludicrous speeds The for loop should only probe up to G[i]bit rates, so that we end up with T[i]bit as the last max units[] slot for snprintf(3), and not possibly an invalid pointer in case rate is multiple of kilo. Fixes: `8cecdc2837` ("tc: more user friendly rates") Reported-by: Jose R. Guzman Mosqueda <jose.r.guzman.mosqueda@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-06-24 23:34:20 -04:00
Stephen Hemminger	03371c7d98	Merge branch 'master' into net-next Conflicts: include/linux/tcp.h lib/libnetlink.c	2015-05-28 09:18:01 -07:00
Stephen Hemminger	c079e121a7	libnetlink: add size argument to rtnl_talk There have been several instances where response from kernel has overrun the stack buffer from the caller. Avoid future problems by passing a size argument. Also drop the unused peer and group arguments to rtnl_talk.	2015-05-27 13:00:21 -07:00
David Ward	aacee2695a	tc: gred: Add support for TCA_GRED_LIMIT attribute Allow the qdisc limit to be set, which is particularly useful when the default VQ is not configured with RED parameters. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 15:30:39 -07:00
Nicolas Dichtel	0628cddd9d	libnetlink: introduce rtnl_listen_filter_t There is no functional change with this commit. It only prepares the next one. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>	2015-05-21 15:28:56 -07:00
Eric Dumazet	df1c7d9138	codel: add ce_threshold support to codel & fc_codel codel & fq_codel packet schedulers are now able to have a threshold for CE marking packets, regardless of the drop/nodrop decision taken by CoDel. This is particularly useful for dctcp and variants, that do not use traditional ECN. Note that fq_codel users would have to specify noecn if ce_threshold is used, otherwise results would be not very interesting, as ecn is default on for fq_codel. $ tc -s qdisc show dev eth1 qdisc codel 8002: root refcnt 45 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 4908469888317 bytes 3351813967 pkt (dropped 0, overlimits 0 requeues 21624365) rate 37671Mbit 3231836pps backlog 4904740b 250p requeues 21624365 count 0 lastcount 0 ldelay 1.1ms drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 410861803 Signed-off-by: Eric Dumazet <edumazet@google.com>	2015-05-21 15:25:05 -07:00
Jiri Pirko	30eb304ecd	tc: add support for Flower classifier Signed-off-by: Jiri Pirko <jiri@resnulli.us>	2015-05-21 15:22:49 -07:00
David Ward	357c45ad3a	tc: gred: Adopt the term VQ in the command syntax and output In the GRED kernel source code, both of the terms "drop parameters" (DP) and "virtual queue" (VQ) are used to refer to the same thing. Each "DP" is better understood as a "set of drop parameters", since it has values for limit, min, max, avpkt, etc. This terminology can result in confusion when creating a GRED qdisc having multiple DPs. Netlink attributes and struct members with the DP name seem to have been left intact for compatibility, while the term VQ was otherwise adopted in the code, which is more intuitive. Use the VQ term in the tc command syntax and output (but maintain compatibility with the old syntax). Rewrite the usage text to be concise and similar to other qdiscs. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	eb6d7d6af1	tc: gred: Handle unsigned values properly in option parsing/printing DPs, def_DP, and DP are unsigned values that are sent and received in TCA_GRED_* netlink attributes; handle them properly when they are parsed or printed. Use MAX_DPs as the initial value for def_DP and DP, and fix the operator used for bounds checking them. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	1693a4d392	tc: gred: Improve parameter/statistics output Make the output more consistent with the RED qdisc, and only show details/statistics if the appropriate flag is set when calling tc. Show the parameters used with "gred setup". Add missing statistics "pdrop" and "other". Fix format specifiers for unsigned values. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	a77905ef6a	tc: gred: Print usage text if no arguments appear after "gred" This is more helpful to the user, since the command takes two forms, and the message that would otherwise appear about missing parameters assumes one of those forms. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	d73e0408e2	tc: gred: Fix whitespace issues in code Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	7bf17a2264	tc: red: Mark "bandwidth" parameter as optional in usage text Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	d93c909a4c	tc: red, gred: Notify when using the default value for "bandwidth" The "bandwidth" parameter is optional, but ensure the user is aware of its default value, to proactively avoid configuration problems. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	6c99695da2	tc: red, gred: Fix format specifier in burst size warning burst is an unsigned value. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
David Ward	9d9a67c756	tc: red, gred: Rename overloaded variable wlog It is used when parsing three different parameters, only one of which is Wlog. Change the name to make the code less confusing. Signed-off-by: David Ward <david.ward@ll.mit.edu>	2015-05-21 14:16:03 -07:00
Daniel Borkmann	ec6f5abcea	tc: minor cleanup on ingress Fix whitespacing and remove the unnecessary condition. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-05-11 09:18:10 -07:00
WANG Cong	285e7768e8	tc: fill in handle before checking argc When deleting a specific basic filter with handle, tc command always ignores the 'handle' option, so tcm_handle is always 0 and kernel deletes all filters in the selected group. This is wrong, we should respect 'handle' in cmdline. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>	2015-05-11 09:13:20 -07:00
Daniel Borkmann	d937a74b6d	tc: {m, f}_ebpf: add option for dumping verifier log Currently, only on error we get a log dump, but I found it useful when working with eBPF to have an option to also dump the log on success. Also spotted a typo in a header comment, which is fixed here as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com>	2015-05-04 08:43:08 -07:00
Daniel Borkmann	4bd624467b	tc: built-in eBPF exec proxy This work follows upon commit `6256f8c9e4` ("tc, bpf: finalize eBPF support for cls and act front-end") and takes up the idea proposed by Hannes Frederic Sowa to spawn a shell (or any other command) that holds generated eBPF map file descriptors. File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: # env \| grep BPF BPF_NUM_MAPS=3 BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1) BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0) BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2) # ls -la /proc/self/fd [...] lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4 [...] lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map The advantage (as opposed to the direct/native usage) is that now the shell is map fd owner and applications can terminate and easily reattach to descriptors w/o any kernel changes. Moreover, multiple applications can easily read/write eBPF maps simultaneously. To further allow users for experimenting with that, next step is to add a small helper that can get along with simple data types, so that also shell scripts can make use of bpf syscall, f.e to read/write into maps. Generally, this allows for prepopulating maps, or any runtime altering which could influence eBPF program behaviour (f.e. different run-time classifications, skb modifications, ...), dumping of statistics, etc. Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com>	2015-04-27 16:39:23 -07:00
Nicolas Dichtel	afa5158f02	tc: fix compilation warning on 32bits arch The warning was: m_simple.c: In function ‘parse_simple’: m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat] Useful to be able to compile with -Werror. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>	2015-04-27 11:41:46 -07:00
Vadim Kochan	46679bbbe8	tc util: Fix possible buffer overflow when print class id Use correct handle buffer length. Signed-off-by: Vadim Kochan <vadim4j@gmail.com>	2015-04-20 10:06:02 -07:00
Felix Fietkau	b8d5c9a71b	tc: add support for connmark action Add ability to add the netfilter connmark support. Typical usage: ...lets tag outgoing icmp with mark 0x10.. iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10 ..add on ingress of $ETH an extractor for connmark... tc filter add dev $ETH parent ffff: prio 4 protocol ip \ u32 match ip protocol 1 0xff \ flowid 1:1 \ action connmark continue ...if the connmark was 0x11, we police to a ridic rate of 10Kbps tc filter add dev $ETH parent ffff: prio 5 protocol ip \ handle 0x11 fw flowid 1:1 \ action police rate 10kbit burst 10k Other ways to use the connmark is to supply the zone, index and branching choice. Refer to help. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2015-04-13 10:49:45 -07:00
Daniel Borkmann	6256f8c9e4	tc, bpf: finalize eBPF support for cls and act front-end This work finalizes both eBPF front-ends for the classifier and action part in tc, it allows for custom ELF section selection, a simplified tc command frontend (while keeping compat), reusing of common maps between classifier and actions residing in the same object file, and exporting of all map fds to an eBPF agent for handing off further control in user space. It also adds an extensive example of how eBPF can be used, and a minimal self-contained example agent that dumps map data. The example is well documented and hopefully provides a good starting point into programming cls_bpf and act_bpf. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>	2015-04-10 13:31:19 -07:00
Stephen Hemminger	bd733e4088	Merge branch 'master' into net-next Conflicts: man/man8/ip-route.8.in	2015-04-07 08:56:14 -07:00
Vadim Kochan	8b90a9907e	tc class: Ignore if default class name file does not exist If '-nm' specified that do not fail if there is no default class names file in /etc/iproute2. Changed default class name file cls_names -> tc_cls. Signed-off-by: Vadim Kochan <vadim4j@gmail.com>	2015-04-07 08:31:56 -07:00
Daniel Borkmann	11c39b5e98	tc: add eBPF support to f_bpf This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support for programmable classifiers"). A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file descriptors) out of its dedicated sections, and via bpf(2) into the kernel and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for annotations, currently, I've used the file name for that, so that the user can easily identify his filter when dumping configurations back. Example usage: clang -O2 -emit-llvm -c cls.c -o - \| llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y tc filter show dev em1 [...] filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c as my next step is to also add the same support for BPF action, so we can have a fully fledged eBPF classifier and action in tc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com>	2015-03-24 15:45:23 -07:00
Daniel Borkmann	51cf36756c	tc: m_bpf: fix next arg selection after tc opcode Next argument after the tc opcode/verdict is optional, using NEXT_ARG() requires to have another argument after that one otherwise tc will bail out. Therefore, we need to advance to the next argument manually as done elsewhere. Fixes: `86ab59a666` ("tc: add support for BPF based actions") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us>	2015-03-24 15:39:53 -07:00
Vadim Kochan	4612d04d6b	tc class: Show class names from file It is possible to use class names from file /etc/iproute2/cls_names which tc will use when showing class info: # tc/tc -nm class show dev lo class htb 1:10 parent 1:1 leaf 10: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b class htb web#1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b class htb voip#1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b or to specify via file path: # tc/tc -nm -cf /tmp/cls_names class show dev lo Class names file contains simple "maj:min name" structure: 1:20 web 1:40 voip Signed-off-by: Vadim Kochan <vadim4j@gmail.com>	2015-03-15 12:27:40 -07:00
Daniel Borkmann	32caee9fc7	m_bpf: remove unrelevant help lines Left-overs when copying this over from cls_bpf. ;) Lets remove them. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us>	2015-02-27 19:00:51 -08:00
Jiri Pirko	86ab59a666	tc: add support for BPF based actions Signed-off-by: Jiri Pirko <jiri@resnulli.us>	2015-02-05 10:38:13 -08:00
Jiri Pirko	1d129d191a	tc: push bpf common code into separate file Signed-off-by: Jiri Pirko <jiri@resnulli.us>	2015-02-05 10:38:13 -08:00
Jamal Hadi Salim	564663b4ca	actions: Get vlan action to work in pipeline When specified in a graph such as: action vlan ... action foobar the vlan action chewed more than it can swallow Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2015-01-13 17:22:44 -08:00
Vadim Kochan	67e1d73be1	tc: Allow to easy change network namespace Added new '-netns' option to simplify executing following cmd: ip netns exec NETNS tc OPTIONS COMMAND OBJECT to tc -n[etns] NETNS OPTIONS COMMAND OBJECT e.g.: tc -net vnet0 qdisc Signed-off-by: Vadim Kochan <vadim4j@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us>	2014-12-27 10:22:34 -08:00
Vadim Kochan	d954b34a1f	tc class: Show classes as ASCII graph Added new '-g[raph]' option which shows classes in the graph view. Meanwhile only generic stats info output is supported. e.g.: $ tc/tc -g class show dev tap0 +---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b \| +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b \| +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b \| \| +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b \| \| \| +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b \| +---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b $ tc/tc -g -s class show dev tap0 +---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b \| \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| \| rate 0bit 0pps backlog 0b 0p requeues 0 \| \| \| +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b \| \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| \| rate 0bit 0pps backlog 0b 0p requeues 0 \| \| \| +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b \| \| \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| \| \| rate 0bit 0pps backlog 0b 0p requeues 0 \| \| \| \| \| +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b \| \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| \| rate 0bit 0pps backlog 0b 0p requeues 0 \| \| \| +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| rate 0bit 0pps backlog 0b 0p requeues 0 \| +---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| rate 0bit 0pps backlog 0b 0p requeues 0 \| +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| rate 0bit 0pps backlog 0b 0p requeues 0 \| +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b \| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) \| rate 0bit 0pps backlog 0b 0p requeues 0 \| +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 Signed-off-by: Vadim Kochan <vadim4j@gmail.com>	2014-12-27 10:16:51 -08:00
Stephen Hemminger	5c2c10b17e	Merge branch 'net-next'	2014-12-24 12:23:00 -08:00
Stephen Hemminger	3d0b7439df	whitespace cleanup Remove all trailing whitespace and space before tabs.	2014-12-20 15:47:17 -08:00
Stephen Hemminger	c9b8aef6ae	Merge branch 'master' into net-next	2014-12-09 16:33:59 -08:00
Stephen Hemminger	b2e116d6c3	tc: minor spelling fixes	2014-12-03 19:28:34 -08:00
Jiri Pirko	8b1c0216d8	tc: add support for vlan tc action Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Cong Wang <cwang@twopensource.com>	2014-12-03 09:29:21 -08:00
Stephen Hemminger	edd3979272	emp: fix warning on deprecated bison directive emp_ematch.y:12.1-13: warning: deprecated directive, use ‘%name-prefix’ [-Wdeprecated] %name-prefix="ematch_" ^^^^^^^^^^^^^	2014-10-09 08:31:10 -07:00
Jamal Hadi Salim	863ecb04b4	discourage use of direct policer interface Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-10-09 08:26:57 -07:00
Jamal Hadi Salim	287bf3a990	route classifier support for multiple actions route can now use the action syntax Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-10-09 08:26:57 -07:00
Jamal Hadi Salim	08139c2ffb	tcindex classifier support for multiple actions tcindex can now use the action syntax Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-10-09 08:26:56 -07:00
Andy Furniss	a07c6d6135	add missing underscore to man page and example nf_mark ematch The man page and the "fail" example are missing an underscore in the nf_mark ematch. eg. tc filter add dev eth0 parent ffff: basic match 'meta(nfmark gt 24)' classid 2:4 meta: unknown meta id ... >>meta(nfmark gt 24)<< ... ... meta(>>nfmark<< gt 24)... Usage: meta(OBJECT { eq \| lt \| gt } OBJECT) where: OBJECT := { META_ID \| VALUE } META_ID := id [ shift SHIFT ] [ mask MASK ] Example: meta(nfmark gt 24) meta(indev shift 1 eq "ppp") meta(tcindex mask 0xf0 eq 0xf0) For a list of meta identifiers, use meta(list). Illegal "ematch" meta(list) does correctly show nf_mark and the above test works with nf_mark. Signed-off-by: Andy Furniss adf.lists@gmail.com	2014-10-09 08:24:00 -07:00
Jamal Hadi Salim	10f5a375ea	rsvp classifier support for multiple actions Example setup: sudo tc qdisc del dev eth0 root handle 1:0 prio sudo tc qdisc add dev eth0 root handle 1:0 prio sudo tc filter add dev eth0 pref 10 proto ip parent 1:0 \ rsvp session 10.0.0.1 ipproto icmp \ classid 1:1 \ action police rate 1kbit burst 90k pipe \ action ok tc -s filter show dev eth0 parent 1:0 filter protocol ip pref 10 rsvp filter protocol ip pref 10 rsvp fh 0x0001100a flowid 1:1 session 10.0.0.1 ipproto icmp action order 1: police 0x5 rate 1Kbit burst 23440b mtu 2Kb action pipe overhead 0b ref 1 bind 1 Action statistics: Sent 98000 bytes 1000 pkt (dropped 0, overlimits 761 requeues 0) backlog 0b 0p requeues 0 action order 2: gact action pass random type none pass val 0 index 2 ref 1 bind 1 installed 60 sec used 3 sec Action statistics: Sent 74578 bytes 761 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Tested-by: John Fastabend <john.r.fastabend@intel.com>	2014-09-29 08:47:33 -07:00
Jamal Hadi Salim	954de6c72b	actions: BugFix action stats to display with -s Was broken by commit `288abf513f` Lets not be too clever and have a separate call to print flushed actions info. Broken looks like: root@moja-1:~# tc actions add action drop index 4 root@moja-1:~# tc -s actions ls action gact action order 0: gact action drop random type none pass val 0 index 4 ref 1 bind 0 installed 9 sec used 4 sec The fixed version looks like: action order 0: gact action drop random type none pass val 0 index 4 ref 1 bind 0 installed 9 sec used 4 sec Sent 108948 bytes 1297 pkts (dropped 1297, overlimits 0) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-09-29 08:47:19 -07:00
Jay Vosburgh	3757185b29	tc/netem: loss gemodel options fixes First, the default value for 1-k is documented as being 0, but is currently being set to 1. (100%). This causes all packets to be dropped in the good state if 1-k is not explicitly specified. Fix this by setting the default to 0. Second, the 1-h option is parsed correctly, however, the kernel is expecting "h", not 1-h. Fix this by inverting the "1-h" percentage before sending to and after receiving from the kernel. This does change the behavior, but makes it consistent with the netem documentation and the literature on the Gilbert-Elliot model, which refer to "1-h" and "1-k," not "h" or "k" directly. Last, fix a minor formatting issue for the options reporting. Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>	2014-08-04 10:15:10 -07:00
Yang Yingliang	aeb199d5ce	fq: allow options of fair queue set to ~0U Some options of fair queue cannot be (~0U). It leads to maxrate cannot be reset to unlimited because it cannot be (~0U). Allow the options being ~0U. Tested by the following command: # tc qdisc add dev eth4 root handle 1: fq limit 2000 flow_limit 200 maxrate 100mbit quantum 2000 initial_quantum 1600 # tc -s -d qdisc show qdisc fq 1: dev eth4 root refcnt 2 limit 2000p flow_limit 200p buckets 1024 quantum 2000 initial_quantum 1600 maxrate 100Mbit Sent 1492 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 1 flows (0 inactive, 0 throttled) 0 gc, 0 highprio, 0 throttled # tc qdisc change dev eth4 root handle 1: fq limit 4294967295 flow_limit 4294967295 maxrate 34359738360 quantum 4294967295 initial_quantum 4294967295 # tc -s -d qdisc show qdisc fq 1: dev eth4 root refcnt 2 limit 4294967295p flow_limit 4294967295p buckets 1024 quantum 4294967295 initial_quantum 4294967295 Sent 38372 bytes 216 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 2 flows (1 inactive, 0 throttled) 0 gc, 2 highprio, 7 throttled Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>	2014-06-09 12:42:36 -07:00
Sergey V. Lobanov	3ff10e82c1	Fixed 'tc qdisc show' for tbf when latency<0 When limit<burst latency becomes <0, for example: # tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit # tc qdisc show qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s If latency<0 there is no reason to show it. Limit will be printed instead of latency when latency<0: # tc qdisc show qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>	2014-05-28 17:08:16 -07:00
Jamal Hadi Salim	288abf513f	actions: correctly report the number of actions flushed This also fixes a long standing bug of not sanely reporting the action chain ordering Sample scenario test on window 1(event window): run "tc monitor" and observe events on window 2: sudo tc actions add action drop index 10 sudo tc actions add action ok index 12 sudo tc actions ls action gact sudo tc actions flush action gact See the event window reporting two entries (doing another listing should show empty generic actions) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-05-28 16:54:31 -07:00
Jamal Hadi Salim	9282d08d93	actions: keyword flowid or classid terminates action pipeline scenario testcase: TC="sudo ./tc/tc" DEV="dev eth0" $TC qdisc del $DEV ingress $TC qdisc add $DEV ingress $TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 action police rate 6Mbit burst 6Mbit drop flowid :1 $TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 action police rate 1Gbit burst 1Gbit pass flowid :1 $TC -s filter ls $DEV parent ffff: protocol ip $TC qdisc del $DEV ingress $TC qdisc add $DEV ingress $TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 flowid 1:1 action police rate 6Mbit burst 6Mbit drop $TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 flowid 1:2 action police rate 1Gbit burst 1Gbit pass $TC -s filter ls $DEV parent ffff: protocol ip $TC qdisc del $DEV ingress $TC qdisc add $DEV ingress $TC filter add $DEV parent ffff: protocol ip pref 10 \ u32 match ip protocol 1 0xff \ flowid 1:10 \ action skbedit mark 11 \ action police rate 10kbit burst 10k pipe index 1 \ action skbedit mark 12 \ action police rate 20kbit burst 20k pipe index 2 \ action mirred egress mirror dev dummy0 $TC -s filter ls $DEV parent ffff: protocol ip $TC qdisc del $DEV ingress $TC qdisc add $DEV ingress $TC filter add $DEV parent ffff: protocol ip pref 10 \ u32 match ip protocol 1 0xff \ action skbedit mark 11 \ action police rate 10kbit burst 10k pipe index 1 \ action skbedit mark 12 \ action police rate 20kbit burst 20k pipe index 2 \ action mirred egress mirror dev dummy0 \ flowid 1:10 $TC -s filter ls $DEV parent ffff: protocol ip Reported-by: Seann Herdejurgen <seann@herdejurgen.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-05-28 16:54:28 -07:00
Jamal Hadi Salim	cacba03b10	Remove unnecessary debug statement Reported-by: Seann Herdejurgen <seann@herdejurgen.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2014-05-28 16:54:26 -07:00
Natanael Copa	dd9cc0ee81	iproute2: various header include fixes for compiling with musl libc We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and sys/select for struct timeval. This fixes the following compile errors with musl libc: f_bpf.c: In function 'bpf_parse_opt': f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function) if (h == LONG_MIN \|\| h == LONG_MAX) { ^ ... tc_util.o: In function `print_tcstats2_attr': tc_util.c:(.text+0x13fe): undefined reference to `MIN' tc_util.c:(.text+0x1465): undefined reference to `MIN' tc_util.c:(.text+0x14ce): undefined reference to `MIN' tc_util.c:(.text+0x154c): undefined reference to `MIN' tc_util.c:(.text+0x160a): undefined reference to `MIN' tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow ... tc_stab.o: In function `print_size_table': tc_stab.c:(.text+0x40f): undefined reference to `MIN' ... fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function) (vni >> 24) \|\| vni == ULONG_MAX) ^ lnstat.h:28:17: error: field 'last_read' has incomplete type struct timeval last_read; /* last time of read */ ^ Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>	2014-05-28 16:51:39 -07:00
Andreas Greve	6e2e5ec28b	fix print_ipt: segfault if more then one filter with action -j MARK. BUG: tc filter show ... produce a segmentation fault if more than one filter rule with action -j MARK exists. Reason: In print_ipt(...) xtables will be initialzed with a pointer to the static struct tcipt_globals at xtables_init_all(). Later on the fields .opts and .options_offset of tcipt_globals are modified. The call of xtables_free_opts(1) at the end of print(...) does not restore the original values of tcipt_globals for the modified fields. It only frees some allocated memory and sets .opts to NULL. This leads to a segmentation fault when print_ipt() is called for the next filter rule with action -j MARK. Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and use it instead of tcipt_globals, so tcipt_globals will be not modified. Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>	2014-05-13 13:10:31 -07:00
Terry Lam	ac74bd2a71	support for Heavy Hitter Filter (HHF) qdisc $tc qdisc add dev eth0 hhf help Usage: ... hhf [ limit PACKETS ] [ quantum BYTES] [ hh_limit NUMBER ] [ reset_timeout TIME ] [ admit_bytes BYTES ] [ evict_timeout TIME ] [ non_hh_weight NUMBER ] $tc -s -d qdisc show dev eth0 qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048 reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0 HHF qdisc parameters: - limit: max number of packets in qdisc (default 1000) - quantum: max deficit per RR round (default 1 MTU) - hh_limit: max number of HHs to keep states (default 2048) - reset_timeout: time to reset HHF counters (default 40ms) - admit_bytes: counter thresh to classify as HH (default 128KB) - evict_timeout: threshold to evict idle HHs (default 1s) - non_hh_weight: DRR weight for mice (default 2) Signed-off-by: Terry Lam <vtlam@google.com>	2014-05-09 12:10:47 -07:00
Jay Vosburgh	8f9672af7a	tc/netem: fix loss state display and p14 parsing The display of the entire netem loss state is shown as if it were gemodel state, as the loss state information is assigned to the wrong pointer. Correct this by assigning the loss state to the correct pointer. Additionally, attempting to set netem loss state will result in random values in the p14 state probability because the option value passed to the kernel by tc netem is not parsed or initialized. Fix this by supplying a default value of 0 for p14 and parsing the p14 value if one is supplied. Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>	2014-05-09 12:06:58 -07:00
Hiroaki SHIMODA	4d4da09e00	htb: Move direct_qlen code part to htb_parse_opt(). The direct_qlen command option is used with qdisc operation. It happened to be implemented in htb_parse_class_opt() which is called with class operation. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com>	2014-03-21 14:20:06 -07:00
WANG Cong	1c9af05071	pedit: do not print debugging information by default Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>	2014-02-10 14:43:52 -08:00
Yang Yingliang	dad2f72bef	netem: add 64bit rates support netem support 64bit rates start from linux-3.13. Add 64bit rates support in tc tools. tc qdisc show dev eth0 qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com>	2014-01-20 12:32:15 -08:00
Yang Yingliang	a01de0a336	tbf: support sending burst/mtu to kernel directly To avoid loss when transforming burst to buffer in userspace, send burst/mtu to kernel directly. Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute") make it can handle burst/mtu. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>	2014-01-20 12:32:14 -08:00
Vijay Subramanian	80dd880dd0	PIE: Proportional Integral controller Enhanced Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net>	2014-01-09 22:50:47 -08:00
Stephen Hemminger	ef056b2190	Merge branch 'master' into net-next-for-3.13	2014-01-09 22:44:17 -08:00
Jamal Hadi Salim	f24a7e7205	dont skip action order attached. cheers, jamal commit 58d78f9f6447df324cdeb99262442c5e3f1f924b Author: Jamal Hadi Salim <jhs@mojatatu.com> Date: Sun Dec 22 10:34:18 2013 -0500 dont skip displaying of action chains or lists by TCA_ACT_MAX_PRIO Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Jamal Hadi Salim	b159a7f1ae	allow batch gets of actions Attached. cheers, jamal commit c5f30cabef14c951596210b96bc9b423b0d39592 Author: Jamal Hadi Salim <hadi@mojatatu.com> Date: Sun Dec 22 10:24:17 2013 -0500 Allow batching of action gets Example: ---- tc actions get \ action gact index 100 \ action gact index 4 ---- Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Jamal Hadi Salim	352f6f97be	simple print newline attached. cheers, jamal commit d7869e6167c3553e93e254940b0647032b40fed8 Author: Jamal Hadi Salim <jhs@mojatatu.com> Date: Sun Dec 22 07:46:28 2013 -0500 print new line at the end for aesthetics Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Jamal Hadi Salim	4bfb21ca20	policer - retire old syntax attached. cheers, jamal commit b82057d9ec851a8aba8a295b959190ef5098f330 Author: Jamal Hadi Salim <jhs@mojatatu.com> Date: Sat Dec 21 17:00:11 2013 -0500 After a decade of trying to deprecate the old policer syntax, I believe it is time to kill it. The kernel build option for old policer is gone for at least 5 years now (although backward compatibility is still there). Being backward compatible meant hijacking the keyword "action" and was obstructing policies like: tc filter add dev eth0 parent ffff: protocol ip pref 10 \ u32 match ip protocol 1 0xff flowid 1:10 \ action skbedit mark 1 \ action police rate 10kbit burst 10k pipe \ action skbedit mark 2 \ action police rate 20kbit burst 20k pipe \ action action mirred egress mirror dev dummy0 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Jamal Hadi Salim	02b1d345b7	skbedit print missing metadata skbedit should print the index and other generic metadata info Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Jamal Hadi Salim	64b7db4db7	skbedit to default to pipe Allow skbedit to be used as is in an action chain by default without need to specify pipe Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-12-28 10:57:34 -08:00
Stephen Hemminger	4d98ab00de	Fix FSF address in file headers	2013-12-06 15:05:07 -08:00
Eric Dumazet	8cecdc2837	tc: more user friendly rates Display more user friendly rates. 10Mbit is more readable than 10000Kbit Before : class htb 1:2 root prio 0 rate 10000Kbit ceil 10000Kbit ... After: class htb 1:2 root prio 0 rate 10Mbit ceil 10Mbit ... Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-12-02 23:48:11 -08:00
Yang Yingliang	ddc6243e9a	tbf: add 64bit rates support tbf support 64bit rates start from linux-3.13. Add 64bit rates support in tc tools. tc qdisc show dev eth0 qdisc tbf 1: root refcnt 2 rate 40000Mbit burst 230000b peakrate 50000Mbit minburst 87500b lat 50.0ms This is a followup to ("htb: support 64bit rates"). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: Eric Dumazet <edumazet@google.com>	2013-12-02 23:46:56 -08:00
Eric Dumazet	8334bb325d	htb: support 64bit rates Starting from linux-3.13, we can break the 32bit limitation of rates on HTB qdisc/classes. Prior limit was 34.359.738.360 bits per second. lpq83:~# tc -s qdisc show dev lo ; tc -s class show dev lo qdisc htb 1: root refcnt 2 r2q 2000 default 1 direct_packets_stat 0 direct_qlen 6000 Sent 6591936144493 bytes 149549182 pkt (dropped 0, overlimits 213757419 requeues 0) rate 39464Mbit 114938pps backlog 0b 15p requeues 0 class htb 1:1 root prio 0 rate 50000Mbit ceil 50000Mbit burst 200000b cburst 0b Sent 6591942184547 bytes 149549310 pkt (dropped 0, overlimits 0 requeues 0) rate 39464Mbit 114938pps backlog 0b 15p requeues 0 lended: 149549310 borrowed: 0 giants: 0 tokens: 336 ctokens: -164 Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-11-22 17:36:18 -08:00
Daniel Borkmann	d05df6861f	tc: add cls_bpf frontend This is the iproute2 part of the kernel patch "net: sched: add BPF-based traffic classifier". [Will re-submit later again for iproute2 when window for -next submissions opens.] Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch>	2013-10-30 16:45:05 -07:00
Nigel Kukard	9bea14ff6b	Fix tc stats when using -batch mode There are two global variables in tc/tc_class.c: __u32 filter_qdisc; __u32 filter_classid; These are not re-initialized for each line received in -batch mode: class show dev eth0 parent 1: classid 1:1 class show dev eth0 parent 1: classid 1:1 Error: duplicate "classid": "1:1" is the second value. This patch fixes the issue by initializing the two globals when we enter print_class(). Signed-off-by: Nigel Kukard <nkukard@lbsd.net>	2013-10-30 16:37:07 -07:00
Stephen Hemminger	734c0ca2ca	htb: remove old unused duplicate qdisc name Alexey had htb2 as name for version in ancient code.	2013-10-27 12:28:38 -07:00
Stephen Hemminger	0a502b21e3	Fix handling of qdis without options Some qdisc like htb want the parse_qopt to be called even if no options present. Fixes regression caused by: `e9e78b0db0` is the first bad commit commit `e9e78b0db0` Author: Stephen Hemminger <stephen@networkplumber.org> Date: Mon Aug 26 08:41:19 2013 -0700 tc: allow qdisc without options	2013-10-27 12:26:47 -07:00
Jamal Hadi Salim	e26520e5c1	action: typo nat fix If you taketh you giveth. I Went the LinuxWay and copied this for m_simple.c and noticed this one typo (I wonder where it came from?;->). Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-09-30 21:31:40 -07:00
Jamal Hadi Salim	087f46ee4e	tc: introduce simple action Simple action is already in the kernel for years now as an example. This complements it with user space control. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-09-30 21:29:34 -07:00
Stephen Hemminger	af60cf40c9	Merge branch 'net-next-3.11'	2013-09-23 13:16:48 -07:00
Eric Dumazet	b43f331828	htb: add support for direct_qlen attribute TCA_HTB_DIRECT_QLEN attribute is supported since linux-3.10 HTB classes use an internal pfifo queue, which limit was not reported by tc, and value inherited from device tx_queue_len at setup time. With this patch, tc displays the value and can change it. Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-09-20 09:48:13 -07:00
Eric Dumazet	8f7574edd8	tc: support TCA_STATS_RATE_EST64 Since linux-3.11, rate estimator can provide TCA_STATS_RATE_EST64 when rate (bytes per second) is above 2^32 (~34 Mbits) Change tc to use this attribute for high rates. Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-09-20 09:46:33 -07:00
Eric Dumazet	bc113e46a3	pkt_sched: fq: Fair Queue packet scheduler Support for FQ packet scheduler $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows (511 inactive, 0 throttled) 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the max deficit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) Usage : tc qdisc add dev $ETH root fq tc qdisc del dev $ETH root 2>/dev/null tc qdisc add dev $ETH root handle 1: mq for i in `seq 1 4` do tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq done Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-09-20 09:43:40 -07:00
Jesper Dangaard Brouer	3e92ff522a	linklayer interface between kernel and tc/userspace This iproute2 tc patch is connected to the kernel - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling) The rate table calculated by tc, have gotten replaced in the kernel and is no-longer used for lookups. This happened in kernel release v3.8 caused by kernel - commit 56b765b79 ("htb: improved accuracy at high rates"). This change unfortunately caused breakage of tc overhead and linklayer parameters. Kernel overhead handling got fixed in kernel v3.10 by - commit 01cb71d2d47 (net_sched: restore "overhead xxx" handling) Kernel linklayer handling got fixed in kernel v3.11 by - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling) The linklayer fix introduced a struct change, that allow the linklayer attribute to be transferred between tc and kernel. This patch make use of this linklayer attribute. The linklayer setting is transfer to the kernel. And linklayer setting received from the kernel is printed with a prefixed "linklayer" when listing current configuration. The default TC_LINKLAYER_ETHERNET is only printed in detailed output mode. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2013-09-03 08:21:24 -07:00
Stephen Hemminger	e9e78b0db0	tc: allow qdisc without options Pfifo_fast needs no options. So don't force it to have parsing code.	2013-08-26 08:41:19 -07:00
Stephen Hemminger	b8a45897b9	More minor spelling fixes	2013-08-04 15:10:05 -07:00
Stephen Hemminger	a3aa47a559	Make tc and ip batch mode consistent Change the code for tc and ip so that batch mode is handled the same.	2013-07-16 10:04:05 -07:00
Eric Dumazet	a303853e84	get_rate: detect 32bit overflows On Mon, 2013-06-03 at 16:36 +0100, Ben Hutchings wrote: > Oops, I read this as being strtol() currently, not strtod(). Currently > '1.5gbit' will work, but this change will break that. So I think you > need to keep bps as a double. Arg > Then here I think the check should be *rate != floor(bps), i.e. accept > rounding down of a non-integer number of bytes but any other change is > assumed to be overflow. Thanks Ben, here is v4 then ;) [PATCH v4] get_rate: detect 32bit overflows Current rate limit is 34.359.738.360 bit per second, and unfortunately 40Gbps links are above it. overflows in get_rate() are currently not detected, and some users are confused. Let's detect this and complain. Note that some qdisc are ready to get extended range, but this will need additional attributes and new iproute2 With help from Ben Hutchings Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>	2013-06-07 09:24:56 -07:00
Stephen Hemminger	22fa92e367	htb: fix indentation iproute2 uses kernel style indenting	2013-06-07 08:54:45 -07:00
Eric Dumazet	44f1ff0afc	htb: report overhead attribute "tc class show dev ..." omits the overhead attribute for HTB. After patch I have : tc class add dev $DEV parent 1: classid 1:1 est 1sec 4sec htb \ rate 12Mbit mtu 1500 quantum 1514 overhead 20 tc class show dev $DEV class htb 1:1 root prio 0 rate 12000Kbit overhead 20 ceil 12000Kbit burst 1500b cburst 1500b Signed-off-by: Eric Dumazet <edumazet@google.com>	2013-06-07 08:53:53 -07:00
Alexander Duyck	cfa292defa	iproute2: act_ipt fix xtables breakage on older versions. In trying to build on a RHEL6.3 I ran into several build issues that are addressed in this patch. The first is that xtables_merge_options only has 3 parameters. It appears this is how this code was originally. As such for the case where the version is less than 6 I am assuming it would be correct to maintain the original setup that only had 3 parameters being passed instead of 4. I also ran into an issue with the define for __ALIGN_KERNEL not being present. I believe this may be due to the fact that __ALIGN_KERNEL was moved into a separate header from ALIGN after the UAPI changes. In order to just cover all of the bases I have moved the main definition for the macros into __ALIGN_KERNEL_MASK and __ALIGN_KERNEL and if ALIGN is also needed then it is just a direct redefine to __ALIGN_KERNEL. Cc: Hasan Chowdhury <shemonc@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-05-01 08:01:47 -07:00
Stephen Hemminger	e7b24b67db	Fix build when shared libraries are disabled On some platforms, shared libraries are not used. The stub code need some updating to not generate errors.	2013-03-13 08:29:59 -07:00
Kees van Reeuwijk	3bed7bb7e7	iproute2: clearer error messages for fifo and tbf qdiscs Clearer error messages for fifo and tbf qdiscs: - Say who is complaining - Don't just say a parameter is bad, show the offending parameter - Be clearer about duplicate parameters vs illegal pairs of parameters - Try to give multiple error messages rather than let the user discover the errors one by one - When there are parameter aliases, try to use the variant that was used, or at least mention them all Note that in the old version an empty parameter list to tbf would just cause an explain() message without a specific error message. By simply removing the relevant error check, the code now handles this error more gracefully by printing an error message for all mandatory parameters. It still prints the explain() message. Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>	2013-02-21 08:34:34 -08:00
Stephen Hemminger	d1f28cf181	ip: make local functions static	2013-02-12 11:38:35 -08:00
Benjamin Poirier	5ab3a4de5e	Use pkg-config to obtain xtables.h path On openSUSE 12.2 (at least) xtables.h is not installed in the system-wide include dir but in /usr/include/iptables-1.4.16.3/. This results in the following build failure: em_ipset.c:26:21: fatal error: xtables.h: No such file or directory Other includers of xtables.h already call out to pkg-config	2013-02-11 09:19:54 -08:00
Johannes Naab	e72ca3fbb0	iproute2: tc netem rate: allow negative packet/cell overhead by fixing the parsing of command-line tokens Signed-off-by: Johannes Naab <jn@stusta.de>	2013-02-04 09:06:50 -08:00
Jamal Hadi Salim	852d51222d	iproute2: act_ipt fix xtables breakage Fixes breakage with xtables API starting with version 1.4.10 Signed-off-by: Hasan Chowdhury <shemonc@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>	2013-01-16 08:14:48 -08:00
Strake	5bd9dd49ae	include needed files Needed to build iproute2 with musl	2012-12-23 11:49:06 -08:00
Mike Frysinger	e4fc4ada33	allow pkg-config to be customized Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can override it to their specific version (like when cross-compiling). This is the same way the upstream pkg-config code works. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2012-11-11 16:21:34 -08:00
Matt Burgess	92905c6e0d	iproute2-3.6.0 assumes presence of iptables Hi, When compiling iproute2-3.6.0 on a host that doesn't have iptables available, I get the following error: gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -DCONFIG_GACT -DCONFIG_GACT_PROB -DYY_NO_INPUT -c -o em_ipset.o em_ipset.c em_ipset.c:26:21: fatal error: xtables.h: No such file or directory Fixed by the following patch, which guards the building of em_ipset.o on the presence of suitable headers. Thanks, Matt.	2012-10-03 08:51:29 -07:00
Rostislav Lisovy	7b5f30e14f	Ematch used to classify CAN frames according to their identifiers This ematch enables effective filtering of CAN frames (AF_CAN) based on CAN identifiers with masking of compared bits. Implementation utilizes bitmap based classification for standard frame format (SFF) which is optimized for minimal overhead. Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>	2012-08-20 13:11:55 -07:00
Dan Kenigsberg	f1675d615b	utils: invarg: msg precedes the faulty arg fix all call which reversed the arg order. Signed-off-by: Dan Kenigsberg <danken@redhat.com>	2012-08-17 13:35:36 -07:00
Florian Westphal	8194411a42	tc: add ipset ematch example usage: tc filter add dev $dev parent $id: basic match not ipset'(foobar src)' .. also updates iproute2/ematch_map, else tc complains: Error: Unable to find ematch "ipset" in /etc/iproute2/ematch_map Please assign a unique ID to the ematch kind the suggested entry is: 8 ipset when trying to use this ematch. (text ematch (5) only exists in kernel, a vlan ematch (6) exists neither in kernel nor userspace, but kernel headers define TCF_EM_VLAN == 6).	2012-08-13 08:33:50 -07:00
Li Wei	6cef544b96	tc: man: change man page and comment to confirm to code's behavior. Since the get_rate() code incorrectly interpreted bare number, the behavior is not the same as man page and comment described. We need to change the man page and comment for compatible with the existing usage by scripts.	2012-07-12 09:05:28 -07:00
Li Wei	424adc19bf	tc: filter: validate filter priority in userspace. Because we use the high 16 bits of tcm_info to pass prio value to kernel, thus it's range would be [0, 0xffff], without validation in tc when user pass a lager(>65535) priority, the actual priority set in kernel would confuse the user. So, add a validation to ensure prio in the range.	2012-07-10 15:39:30 -07:00
Hiroaki SHIMODA	690b11f4a6	tc: u32: Fix firstfrag filter. On current firstfrag filter, all non fragmented packets are matched. firstfrag should check MF bit. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>	2012-07-10 15:39:02 -07:00
Hiroaki SHIMODA	1d62f99fe2	tc: u32: Fix icmp_code off. The off of icmp_code is not 20 but 21. Also offmask should be 0 unless nexthdr+ is specified. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>	2012-07-10 15:39:02 -07:00
Li Wei	3c4f545633	tc: prio: Perform more strict check on priomap. Since band number counts from zero thus band must be little than opt.bands.	2012-06-18 12:25:08 -07:00
Vijay Subramanian	50a3ec3c46	tc-codel: Update usage text codel can take 'noecn' as an option. This also makes it consistent with the manpage. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>	2012-05-24 15:02:05 -07:00
Eric Dumazet	c3524efc14	fq_codel: Fair Queue Codel AQM Fair Queue Codel packet scheduler Principles : - Packets are classified (internal classifier or external) on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed on same slot) - Each flow has a CoDel managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered (CoDel uses a FIFO) - head drops only. - ECN capability is on by default. - Very low memory footprint (64 bytes per flow) tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ] [ target TIME ] [ interval TIME ] [ noecn ] [ quantum BYTES ] Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Kathleen Nichols <nichols@pollere.com> Cc: Van Jacobson <van@pollere.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Changli Gao <xiaosuo@gmail.com>	2012-05-22 14:17:49 -07:00
Eric Dumazet	185d88f99b	tc_codel: Controlled Delay AQM An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson. http://queue.acm.org/detail.cfm?id=2209336 This AQM main input is no longer queue size in bytes or packets, but the delay packets stay in (FIFO) queue. As we don't have infinite memory, we still can drop packets in enqueue() in case of massive load, but mean of CoDel is to drop packets in dequeue(), using a control law based on two simple parameters : target : target sojourn time (default 5ms) interval : width of moving time window (default 100ms) Selected packets are dropped, unless ECN is enabled and packets can get ECN mark instead. Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ] [ interval TIME ] [ ecn ] qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0) rate 202365Kbit 16708pps backlog 113550b 75p requeues 0 count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us maxpacket 1514 ecn_mark 84399 drop_overlimit 0 CoDel must be seen as a base module, and should be used keeping in mind there is still a FIFO queue. So a typical setup will probably need a hierarchy of several qdiscs and packet classifiers to be able to meet whatever constraints a user might have. One possible example would be to use fq_codel, which combines Fair Queueing and CoDel, in replacement of sfq / sfq_red. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>	2012-05-22 14:13:52 -07:00
Vijay Subramanian	1070205dc0	tc-netem: Add support for ECN packet marking This patch provides support for marking packets with ECN instead of dropping them with netem. This makes it possible to make use of the netem ECN marking feature that was added recently to the kernel. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>	2012-05-22 14:10:21 -07:00
Christoph J. Thompson	5c434a9e5a	iproute2 - Fix up and simplify variables pointing to install directories Define where is the are located the iproute2 config files. Get rid of trailing slashes for paths in several file. Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>	2012-04-12 09:49:10 -07:00
Stephen Hemminger	ff24746cca	Convert to use rta_getattr_ functions User new functions (inspired by libmnl) to do type safe access of routeing attributes	2012-04-10 08:47:55 -07:00
Anton Danilov	90d98edf39	csum action, fix typo	2012-03-15 14:24:59 -07:00
Andreas Henriksson	f526af995e	iproute: fix tc -iec display of Mibit rates As reported by Thomas Mühlgrabner <muehltom@cable.vol.at> in http://bugs.debian.org/662979 : When showing htb class configuration with "tc -iec class show", the output for Mibit is actually the value for bit. Example: configure a class with a ceil of 1000Mibit. Output states 1048576000 Mibit. The cause is missing parenteses in the display code of tc.... (Please also note that a lower value of 100Mibit will be displayed as 102400 Kibit, which I think is kind of ugly.) Reported-by: Thomas Mühlgrabner <muehltom@cable.vol.at> Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2012-03-10 09:13:58 -08:00
Yegor Yefremov	8ced4fcd50	iproute2: cleanup dependencies LIBNETLINK will be defined in the main Makefile, so both ../lib/libnetlink.a ../lib/libutil.a will be automatically appended during linking. Otherwise ../lib/libnetlink.a ../lib/libutil.a will appear twice during linking. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>	2012-02-27 08:27:54 -08:00
Petr Sabata	e2a4536a43	iproute2: tc - mqprio formatted print fix Just a minor correction of mqprio printf()'s. Reported-by: Petr Písař <ppisar@redhat.com> Signed-off-by: Petr Šabata <contyk@redhat.com>	2012-02-22 15:23:12 -08:00
Stephen Hemminger	d798a0483e	red: add missing include math.h red now uses pow() function.	2012-02-06 09:45:50 -08:00
Vijay Subramanian	14a1c164d1	netem: Fail cleanly if user input is wrong (Resending patch since it looks like my earlier mail did not make it to netdev). netem reordering requires that the delay parameter be given. Currently, if no delay is given, tc prints the error message but still installs the qdisc. Fix this by printing the usage and failing cleanly. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>	2012-01-20 11:21:58 -08:00
Eric Dumazet	1b6f0bb5be	gred: support TCA_GRED_MAX_P attribute TCA_GRED_MAX_P permits to express high resolution probabilities. New output (on 3.3+ kernel) : disc gred 9442: root refcnt 17 DP:0 (prio 1) Average Queue 0b Measured Queue 0b Packet drops: 0 (forced 0 early 0) Packet totals: 20 (bytes 2584) limit 31460b min 3000b max 9000b ewma 5 probability 0.05 Scell_log 15 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2012-01-20 08:12:24 -08:00
Eric Dumazet	650252d8c3	choke: support TCA_CHOKE_MAX_P TCA_CHOKE_MAX_P permits to express high resolution RED probability. tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec choke \ limit 90 ecn min 10 max 30 probability 0.05 bandwidth 10Mbit Before patch : tc -s -d qdisc show dev eth3 qdisc ... limit 90p min 10p max 30p ecn ewma 3 Plog 19 Scell_log 13 After : qdisc ... limit 90p min 10p max 30p ecn ewma 3 probability 0.05 Scell_log 13 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2012-01-20 08:12:23 -08:00
Eric Dumazet	6987ecf083	sfq: add optional RED on top of SFQ Adds an optional Random Early Detection on each SFQ flow queue. Traditional SFQ limits count of packets, while RED permits to also control number of bytes per flow, and adds ECN capability as well. 1) We dont handle the idle time management in this RED implementation, since each 'new flow' begins with a null qavg. We really want to address backlogged flows. 2) if headdrop is selected, we try to ecn mark first packet instead of currently enqueued packet. This gives faster feedback for tcp flows compared to traditional RED [ marking the last packet in queue ] Example of use : tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \ limit 3000 headdrop flows 512 divisor 16384 \ redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 ewma 6 min 8000b max 60000b probability 0.2 ecn prob_mark 0 prob_mark_head 4876 prob_drop 6131 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007 requeues 0) rate 99483Kbit 8219pps backlog 689392b 456p requeues 0 In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled flows, we can see number of packets CE marked is smaller than number of drops (for non ECN flows) If same test is run, without RED, we can check backlog is much bigger. qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0) rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2012-01-20 08:12:22 -08:00
Eric Dumazet	54a2fce832	red: fix adaptive spelling Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2012-01-20 08:12:21 -08:00
Eric Dumazet	e7e4abea3e	red: Add adaptative algo Logged in as shemminger Enable Adaptative RED algo, using : tc qdisc ... red limit BYTES ... adaptative ... Support of high precision probability/max_p setting and reporting, with support of old kernels. With a new kernel, "Plog ..." is replaced in tc output by "probability value" : qdisc red 10: dev eth3 parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5 probability 0.09 Scell_log 15	2012-01-19 14:45:20 -08:00
Hagen Paul Pfeifer	6b8dc4deea	tc: netem rate shaping and cell extension This patch add rate shaping as well as cell support. The link-rate can be specified via rate options. Three optional arguments control the cell knobs: packet-overhead, cell-size, cell-overhead. To ratelimit eth0 root queue to 5kbit/s, with a 20 byte packet overhead, 100 byte cell size and a 5 byte per cell overhead: tc qdisc add dev eth0 root netem rate 5kbit 20 100 5 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>	2012-01-19 14:28:27 -08:00
Jan Engelhardt	8e91a80d97	iproute2: fix calling up the xt action Upsteam: has not been sent yet Requesting the xt action never succeeded because it registered using the wrong name.	2012-01-03 15:07:38 -08:00
Jan Engelhardt	d7aa57d450	iproute2: proper detection of libxtables position and flags Upstream: not sent yet Any tests involving iptables _MUST_ utilize pkg-config to find the proper locations of the installation.	2012-01-03 15:05:25 -08:00
Stephen Hemminger	155ad8023b	ematch: fix warning about unused input() Use existing compile flag to indicate that input() is not used by tc ematch, fixes compiler warning.	2012-01-03 13:55:59 -08:00
Stephen Hemminger	5761f04fb8	ematch: fix warning about yyerror and const yyerror() should take const char * on current bison.	2012-01-03 13:55:00 -08:00
Stephen Hemminger	cd70f3f522	libnetlink: remove unused junk callback Both rtnl_talk and rtnl_dump had a callback for handling portions of netlink message that do not match the correct pid or seq. But this callback was never used by any part of iproute2 so remove it.	2011-12-28 10:37:12 -08:00
Eric Dumazet	d060de7f8d	netem: fix a typo in explain() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-12-24 11:21:33 -08:00
Stephen Hemminger	3c7950af59	netem: add support for 4 state and GE loss model Incorporate support for new loss models.	2011-12-22 17:08:11 -08:00
Eric Dumazet	841fc7bc98	red: harddrop support and cleanups Add harddrop support (kernel support added a long time ago), and various cleanups. min BYTES, max BYTES are now optional and follow Sally Floyd's recommendations. By the way, our default 2% probability is a bit low, Sally recommends 10%. Not a big deal if upcoming adaptative algo is deployed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-12-08 16:43:18 -08:00
Eric Dumazet	ab15aeacf5	red: make burst optional Documentation advises to set burst to (min+min+max)/(3*avpkt) Let tc do this automatically if user doesnt provide burst himself. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-12-01 09:23:49 -08:00
Eric Dumazet	0cf67ead7b	red: give a hint about burst value Check for burst values that are too small. Reported-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-12-01 09:23:43 -08:00
Thomas Jarosch	fcbd0165fc	tc: Use correct variable type for get_distribution() result get_distribution() returns an int. cppcheck reported: [tc/q_netem.c:243]: (style) Checking if unsigned variable 'dist_size' is less than zero. The mismatch actually rendered the error checking after get_distribution() ineffective. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>	2011-11-23 14:46:24 -08:00
Thomas Jarosch	a3da01c519	tc: Remove unused variable 'res'. Detected by cppcheck. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>	2011-11-23 14:46:21 -08:00
Stephen Hemminger	93ba481acb	cleanup ematch yacc files make clean needs to remove all the yacc output files for ematch.	2011-11-02 16:39:36 -07:00
Michal Soltys	41f6004139	HFSC (7) & (8) documentation + assorted changes This patch adds detailed documentation for HFSC scheduler. It roughly follows HFSC paper, but tries to not rely too much on math side of things. Post-paper/Linux specific subjects (timer resolution, ul service curve, etc.) are also discussed. I've read it many times over, but it's a lengthy chunk of text - so try to be understanding in case I made some mistakes. tc-hfsc(7): explains algorithm in detail (very long) tc-hfsc(8): explains command line options briefly tc(8): adds references to new man pages Makefile: adds man7 directory to install target q_hfsc.c: minimal help text changes, consistency with tc-hfsc(8) Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2011-11-02 16:33:50 -07:00
Mike Frysinger	aa48b5931a	tc: fix parallel build file with lex/yacc Building iproute2 in parallel might hit the race failure: emp_ematch.l:2:30: fatal error: emp_ematch.yacc.h: No such file or directory make[1]: *** [emp_ematch.lex.o] Error 1 This is because we currently allow the yacc/lex files to generate and compile in parallel. So add a simple dependency to make sure yacc has finished before we attempt to compile the lex output. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2011-10-18 15:02:21 -07:00
Thomas Jarosch	1a6543c56b	Fix memory leak of lname variable in get_target_name() Detected by cppcheck. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>	2011-10-07 11:17:10 -07:00
Thomas Jarosch	9f1ba57016	Fix wrong sanity check in choke_parse_opt() Detected by cppcheck. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>	2011-10-07 11:17:03 -07:00
Thomas Jarosch	6d5ee98a7c	Fix wrong comparison in cmp_print_eopt() Detected by cppcheck. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>	2011-10-07 11:16:15 -07:00
Dan McGee	4f3626f920	xt: only unset fields if m is non NULL	2011-08-31 12:18:49 -07:00
Florian Westphal	05fb9184f2	tc: filter: fix default 'protocol all' on little-endian platforms when specifiying filters without 'protocol' keyword, tc will default to 'protocol all'. Unfortunately, this missed a byte-ordering conversion.	2011-08-31 10:55:13 -07:00
Stephen Hemminger	c441bd4c1b	Add QFQ scheduler Basic configuration support for QFQ. Still need to add manual page.	2011-07-13 13:46:34 -07:00
Stephen Hemminger	be181323c1	Remove redundant limits.h redo.	2011-07-13 09:49:17 -07:00
Andreas Henriksson	73de5d9680	iproute2: Fix building xt module against xtables version 6 iptables/xtables apparently changed API again.... Now you need to pass and extra parameter (orig_opts) which was not needed before. Sprinkle some lovely pre-processor magic to be compatible with both older and new versions. In the beginning of times XTABLES_VERSION_CODE didn't exist. Then it was (0x10000 * major + 0x100 * minor + patch) when it was first introduced (according to git), but now it's at 6... Don't know what official iptables releases has defined it to over time. Lets just hope none of the older versions with is has the define higher then 6 is still around.... so only the "current" versioning scheme is supported.... lets see how long this lasts now. For the API change in xtables, see: http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commitdiff;h=600f38db82548a683775fd89b6e136673e924097 Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2011-07-11 10:18:14 -07:00
Petr Sabata	5582c0cffd	iproute2: Remove unreachable code This patch removes unreachable, useless code. Signed-off-by: Petr Sabata <contyk@redhat.com>	2011-07-11 10:13:51 -07:00
Stephen Hemminger	49dff8c88c	xt match: fix set-never-used warning	2011-06-29 15:59:41 -07:00
Stephen Hemminger	02ee3dbc78	skbedit: fix set-never-used warning	2011-06-29 15:59:02 -07:00
Stephen Hemminger	bf808cbf84	tc: fix set never used warning in red	2011-06-20 14:34:30 -07:00
Stephen Hemminger	bcd7abddd4	tc filter: fix dport/sport in pretty print output Problem reported by Peter Lebbing on Debian. The decode of source and destination port filters in pretty print mode was backwards.	2011-05-19 09:19:17 -07:00
John Fastabend	892eba309f	iproute2: improve mqprio inputs for queue offsets and counts This changes mqprio input format to be more user friendly. Old usage, # ./tc/tc qdisc add dev eth3 root mqprio help Usage: ... mqprio [num_tc NUMBER] [map P0 P1...] [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1\|0] New usage, # ./tc/tc qdisc add dev eth3 root mqprio help Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...] [queues count1@offset1 count2@offset2 ...] [hw 1\|0] Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>	2011-04-26 14:59:32 -07:00
John Fastabend	914953046a	iproute2: tc add mqprio qdisc support Add mqprio qdisc support. Output matches the following, qdisc mq 0: dev eth1 root qdisc mq 0: dev eth2 root qdisc mqprio 8001: dev eth3 root tc 8 map 0 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 queues:(0:7) (8:15) (16:23) (24:31) (32:39) (40:47) (48:55) (56:63) And usage is, Usage: ... mclass [num_tc NUMBER] [map P0 P1...] [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1\|0] Signed-off-by: John Fastabend <john.r.fastabend@intel.com>	2011-04-12 14:28:19 -07:00
Juliusz Chroboczek	d7f3299d59	tc : SFB flow scheduler Supports SFB qdisc (included in linux-2.6.39) 1) Setup phase : accept non default parameters 2) dump information qdisc sfb 11: parent 1:11 limit 1 max 25 target 20 increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600000ms 60000ms) Sent 47991616 bytes 521648 pkt (dropped 549245, overlimits 549245 requeues 0) rate 7193Kbit 9774pps backlog 0b 0p requeues 0 earlydrop 0 penaltydrop 0 bucketdrop 0 queuedrop 549245 childdrop 0 marked 0 maxqlen 0 maxprob 0.00000 avgprob 0.00000 Signed-off-by: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-04-12 14:27:37 -07:00
Stephen Hemminger	59a935d204	Update email address of netem	2011-04-12 14:24:01 -07:00
Stephen Hemminger	d7ac9ad4f4	Fix warning in u32 from assignment in conditional	2011-04-12 14:23:39 -07:00
Eric Dumazet	f3f28c2126	sfq: add divisor support In 2.6.39, we can build SFQ queues with a given hash table size,	2011-02-25 12:59:53 -08:00
Stephen Hemminger	a4eca97cff	CHOKe scheduler TC commands for CHOKe qdisc	2011-01-31 09:09:50 -08:00
Gregoire Baron	3822cc986c	tc: add ACT_CSUM action support (csum) Add the iproute2 support for the ACT_CSUM action. Can be used as following, certainly in conjunction with the ACT_PEDIT action (pedit): # In order to DNAT (stateless) IPv4 packet from 192.168.1.100 to # 0x12345678 (18.52.86.120), and update the IPv4 header checksum and # the UDP checksum (the last one, only if the packet is UDP). tc filter add eth0 prio 1 protocol ip parent ffff: \ u32 match ip src 192.168.1.100/32 flowid :1 \ action pedit munge offset 16 u32 set 0x12345678 \ pipe csum ip and udp # In order to alter destination address of IPv6 TCP packets from fc00::1 # and correct the TCP checksum (nothing happened? except maybe for # checksums in the TCP payload ...). tc filter add eth0 prio 1 protocol ipv6 parent ffff: \ u32 match ip6 src fc00::1/128 match ip6 protocol 0x06 0xff flowid :1 \ action pedit munge offset 24 u32 set 0x12345678 \ pipe csum tcp	2010-12-01 11:17:46 -08:00
Changli Gao	7162c92148	iproute2: tc: f_flow: add key rxhash We can use rxhash to classify the traffic into flows. As rxhash maybe supplied by NIC or RPS, it is cheaper. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>	2010-11-30 09:57:36 -08:00
Mike Frysinger	be3c4d4f3c	m_xt: stop using xtables_set_revision() iptables dropped the xtables_set_revision() function around version 1.4.9, so set the rev directly ourselves. This should be compatible back to the original version m_xt itself is designed for. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2010-11-30 09:48:38 -08:00
Stephen Hemminger	cb4bd0ec8d	Fix GRED options clearing Bug reported where priorities of GRED DP's are ignored. The option parsing sets opt then memset was clearing these values.	2010-08-25 09:04:55 -07:00
Stephen Hemminger	e3d153c1fb	Fix byte order of ether address match for u32 The u32 key match was incorrect byte order when using ether source or destination address matching.	2010-08-02 11:55:30 -07:00
Andreas Henriksson	02833d1b38	tc: make symbols loaded from tc action modules global. Fixes problems with xtables based MARK target ("ipt" module). When tc loads the "ipt" (xt) module it kept the symbols local, this made loading of libxtables not find the required struct. currently ipt/xt is the only tc action module. iproute2 never seem to do dlclose. hopefully the modules doesn't export more symbols then needed. In this situation hopefully the RTLD_GLOBAL flag won't hurt us. I've been using this patch in the Debian package of iproute for the last 3 weeks and noone has complained. ( This fixes http://bugs.debian.org/584898 ) Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2010-08-02 09:54:59 -07:00
Stephen Hemminger	4b45abd1f0	Fix NULL pointer reference when using basic match If basic match has no tree of matches underneath then print_ematch would core dump.	2010-07-29 18:03:35 -07:00
Petr Lautrbach	0156412215	iproute: fix tc generating ipv6 priority filter This patch adds ipv6 filter priority/traffic class function static int parse_ip6_class(int argc_p, char *argv_p, struct tc_u32_sel sel) shifting filter value to 5th bit and ignoring "at" as header position is exactly given. Signed-off-by: Petr Lautrbach <plautrba@redhat.com>	2010-07-23 12:29:35 -07:00
Mike Frysinger	bf512683e0	tc: revert "echo" in install target The recent commit "iproute2: add option to build m_xt as a tc module" (`ab814d6355`) looks like it wrongly included debug changes in the install target. So drop the `echo` so the tc binary actually gets installed again. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2010-07-23 12:28:25 -07:00
Bart Trojanowski	608a96c727	fix build issues with flex ver 2.5 When building on an old environment, the flex generated tc/emp_ematch.lex.c file would not compile. The error given was: emp_ematch.lex.c:1686: error: expected â;â, â,â or â)â before numeric constant The emp_ematch.l uses 'str' as a start symbol name, and flex would create a '#define str 1' statement. This particular version of flex, unfortunately, used 'str' as names of string variables in the generated parser functions. This is line 1686 in the generated file: YY_BUFFER_STATE ematch__scan_string (yyconst char * str ) This patch just substitutes 'str' for 'lexstr' in emp_ematch.l to avoid the collision.	2010-04-22 15:27:42 -07:00
Andreas Henriksson	ab814d6355	iproute2: add option to build m_xt as a tc module (v3) This will build the xt module (action ipt) of tc as a shared object that is linked at runtime by tc if used, rather then built into tc. This is similar to how the atm qdisc support is handled (q_atm.so). Signed-off-by: Andreas Henriksson <andreas@xxxxxxxx>	2010-04-12 11:40:29 -07:00
Stephen Hemminger	edaaa11e5a	Workaround missing ALIGN() macro.	2010-03-29 17:37:49 -07:00
Stephen Hemminger	1b84ad557e	Remove mirred debug message Other commands are quiet if successful. mirred action had leftover debug message.	2010-03-29 17:32:37 -07:00
Stephen Hemminger	609ceb807d	Workaround missing ALIGN() macro XT_ALIGN() calls ALIGN macro but ALIGN is in kernel source not userspace.	2010-03-29 15:17:48 -07:00
Andreas Henriksson	12ddfff76c	iproute2: detect iptables modules dir in configure. Try to automatically detect iptables modules directory. Make the configure script look for iptables modules. This also makes it possible to specify it on the command line while building via "make IPT_LIB_DIR=/foo/bar". Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2010-03-29 15:10:20 -07:00
jamal	e906975a53	skbedit: use get_u32 for parsing mark parsing a mark as a classid allows for acceptance of strange informal input. cheers, jamal commit aad0da6507ff8a95a63ed8e529c05f52be5b0e75 Author: Jamal Hadi Salim <hadi@cyberus.ca> Date: Mon Feb 15 06:45:29 2010 -0500 skbedit: use get_u32 for parsing mark get_u32 is the more appropriate parser for a mark. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>	2010-03-03 16:35:30 -08:00
Hagen Paul Pfeifer	f703129d34	tc: add new queue discipline: head drop fifo This adds the required changes to gain access to the head drop classfull queuing discipline named pfifo_head_drop. In difference to pfifo or pfifo_fast this queuing discipline will drop the first packet in the case of queue congestion. As a result the queue contain always the freshest packets. To replace the current a root queueing discipline for eth0: $ tc qdisc replace dev eth0 root pfifo_head_drop And show statistics: $ tc -s qdisc show dev eth0 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>	2010-03-03 16:15:44 -08:00
Florian Westphal	8d8de1139c	tc: remove stale code remove unused #define and "ok" statements. Signed-off-by: Florian Westphal <fwestphal@astaro.com>	2010-01-21 10:13:01 -08:00
Florian Westphal	ddf216c863	tc: red, gred, tbf: more helpful error messages $ tc qdisc add dev eth1 root tbf RTNETLINK answers: Invalid argument $ tc qdisc add dev eth1 root red RTNETLINK answers: Invalid argument with patch: $ tc qdisc add dev eth1 root red Required parameter (min, max, burst, limit, avpkt) is missing $ tc qdisc add dev eth1 root tbf Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS ... Signed-off-by: Florian Westphal <fw@strlen.de>	2010-01-21 10:12:57 -08:00
Mike Frysinger	73152614bc	tc: respect LDFLAGS for %.so targets Since there aren't any targets that currently use this pattern rule, this is more of a proactive fix. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2010-01-21 10:05:39 -08:00
Jamal Hadi Salim	e04dd30a38	skbedit: Add support to mark packets This adds support for setting the skb mark. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>	2009-12-26 11:12:43 -08:00
Stephen Hemminger	985f4578c6	Fix warning about strtod() return value	2009-12-26 10:20:50 -08:00
Andreas Henriksson	a36ceb85d7	Add new (iptables 1.4.5 compatible) tc/ipt/xt module. Add a new cleaned up m_xt.c based on m_xt_old.c The new m_xt.c has been updated to use the new names and new api that xtables exposes in iptables 1.4.5. All the old internal api cruft has also been dropped. Additionally, a configure script test is added to check for the new xtables api and set the TC_CONFIG_XT flag in Config. (tc/Makefile already handles this flag in previous commit.) Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2009-12-26 10:09:27 -08:00
Andreas Henriksson	80d689d055	Keep the old tc/ipt/xt module for compatibility. Move the file and rename the configure flags. The file is being kept around for iptables < 1.4.5 compatibility. Signed-off-by: Andreas Henriksson <andreas@fatal.se>	2009-12-26 10:09:26 -08:00
Patrick McHardy	c90308ffc7	f_fw: fix compat mode The kernel takes a lack of options as indication that the fw classifier should operate in compatibility mode, where marks are mapped directly to classids. Commit `e22b42a` (tc mask patch) broke this by adding an empty TCA_OPTIONS attribute even if no handle is specified. Restore the old behaviour. Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-12-01 16:20:01 -08:00
Stephen Hemminger	232642c28c	Remove Changes: comments Discourage developers from putting change log in comments now that software has been under change control for 5 years.	2009-12-01 15:49:48 -08:00
Mike Frysinger	05b4f8492b	tc: remove dlfcn.h from files that dont need it A bunch of source files look like they're copy & pasted from other files, and some include header files that they don't actually need. Since dlfcn has very specific usage (and is a pain on a static-only system), drop it where it isn't really needed. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2009-11-13 14:14:07 -08:00
Mike Frysinger	f2e27cfb01	support static-only systems The iptables code supports a "no shared libs" mode where it can be used without requiring dlfcn related functionality. This adds similar support to iproute2 so that it can easily be used on systems like nommu Linux (but obviously with a few limitations -- no dynamic plugins). Rather than modify every location that uses dlfcn.h, I hooked the dlfcn.h header with stub functions when shared library support is disabled. Then symbol lookup is done via a local static lookup table (which is generated automatically at build time) so that internal symbols can be found. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2009-11-10 10:44:20 -08:00
Mike Frysinger	729cbe84b8	tc/q_atm.so: respect LDFLAGS The q_atm.so target defines its own link target, but it doesn't respect the $(LDFLAGS) variable. Signed-off-by: Mike Frysinger <vapier@gentoo.org>	2009-08-06 14:50:08 -07:00
Stephen Hemminger	1558971d43	fix handling of GRED DPs args	2009-05-26 15:58:05 -07:00
Denys Fedoryshchenko	f4a8b23d39	Filter class output by classid Sometimes while dividing bandwidth by classes it is useful to see how some specific class doing things live. Which my simple patch it is possible to do watch -n1 "tc -s -d class show dev eth0.2022 classid 1:1520" and to get live statistics, how packets queued or dropped, and how much bandwidth used (if estimator defined) for specific class. Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>	2009-05-26 15:20:26 -07:00
Stephen Hemminger	ebde878097	Allow default DP of zero in gred To emulate WRED behaviour, allow default DP of zero.	2009-05-26 15:15:01 -07:00
Stephen Hemminger	d13cee6d59	Add IPV6 match pretty print	2009-05-26 15:14:29 -07:00
Stephen Hemminger	b4d41f41b6	Add u32 extension to match on ether source/destination Use existing u32 mechanism to match based on Ethernet header. No need for protocol that already exists.	2009-04-15 15:39:34 -07:00
Thomas Graf	ff213c4bf2	cgroup support Stephen, iproute2 part of the cgroup classifier that has been included upstream for a while. Please apply.	2009-04-13 13:38:33 -07:00
Stephen Hemminger	9fce67dd46	Remove goto chain The selector logic is clearer with if / else if	2009-04-03 09:44:04 -07:00
Stephen Hemminger	52d6a85050	remove duplicate limits.h	2009-03-27 11:07:46 -07:00
Petr Jediný	10494d2724	Changing commandline help text to be more uniform...	2009-03-27 11:05:44 -07:00
Stephen Hemminger	44e50c8e78	Add missing limits.h Need limits.h to get INT_MIN on Debian	2009-03-01 20:36:38 -08:00
Denys Fedoryschenko	a589dcda9c	Fix memory leak in local options This change was forgotten by Stephen in the last release Signed-off-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>	2009-02-19 09:04:06 -08:00
Jamal Hadi Salim	63c7d26f94	Breakage noticed when debian upgraded to xtables (iptables > 1.4.1) Many thanks to Yevgeny Kosarzhevsky <yevg@pisem.net> for reporting and a lot of testing Thanks to Jan Engelhardt <jengelh@medozas.de> for a lot of advice Thanks to Denys Fedoryschenko <denys@visp.net.lb> for some sample code that he tried and thanks to Andreas Henriksson <andreas@fatal.se> (who maintains iproute2 on debian) for the persistent followup. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>	2009-02-19 09:02:13 -08:00
Stephen Hemminger	46a6573259	fix uninitialized memory in tc_skbedit Original from: Alexander Duyck <alexander.h.duyck@intel.com> A bug was found in which the memory for the tc_skbedit struct was being used uninitialized to 0. Alternative version of original fix using initializer rather than memset. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2009-02-19 08:59:06 -08:00
Patrick McHardy	c86f34942a	iproute: add DRR support add DRR support This patch adds support for the DRR scheduler I just sent to iproute. Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-01-27 16:11:39 -08:00
Stephen Hemminger	bdc213423a	Fix leftovers from earlier change Still had references to l_name.	2009-01-07 17:20:14 -08:00
Denys Fedoryshchenko	6e34e7dc0a	Fix tc/m_ipt memory leaks 1)optind according iptables sources have to be set to 0. If it is set to 1, in batch it will mess up things. Also in iptables sources i notice that ->tflags and ->used need to be reset. 2)Since target->t = fw_calloc(1, size); allocated memory in function build_st, it have to be freed at the end, or in batch we will have memory leak. TODO: Probably it must be freed in all "return -1" cases in parse_ipt after build_st. About this i am not sure, up to Stephen. 3)new_name was malloc'ed, but not freed	2009-01-06 19:46:11 -08:00
Alexander Duyck	fe1a34fa81	add support for multiq qdisc Add support for multiq qdisc This patch adds the ability to configure the multiq qdisc. Since the qdisc does not require any input it will pull the number of bands directly from the device that it is added to the root of. usage: tc qdisc add dev <DEV> root handle <HANDLE> multiq Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2009-01-06 19:29:25 -08:00
Alexander Duyck	f72a7aab0c	add support for skbedit action Provides ability to edit queue_mapping field Provides ability to edit priority field usage: action skbedit [queue_mapping QUEUE_MAPPING] [priority PRIORITY] at least one option must be select, or both at the same time Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2009-01-06 19:27:03 -08:00
Stephen Hemminger	3a99df7074	tc filter help should just print usage Doing tc filter help should end argument processing. This prevents extraneous messages. Reported by Marcela Maslanova	2008-10-13 07:00:48 -07:00
Stephen Hemminger	bc7d1bd88d	Fix duplicate return Get rid of dead code	2008-09-19 08:49:07 -07:00
Andreas Henriksson	5e3bb534ae	iproute: DESTDIR vs LIBDIR. Hello Rafael Almeida. I noticed your patch adding DESTDIR support in the latest iproute2 release. Much appreciated! Soon the debian packages might be able to move to actually using "make install" rather then it's own installation procedure when building packages. I've noticed something that will break though.... Debian packages usually sets DESTDIR=debian/tmp/ and packages the contents of that directory as if it where the root file system. This will break the /usr/lib/{tc,ip}/ module loading, because they DESTDIR (/usr) will be /whatever-the-build-path-was/debian/tmp/lib/{tc,ip}/. I beleive others usually call this the LIBDIR to make the separation between DISTDIR being the (possibly temporary) place things are put when build is done, and LIBDIR (and others) are used for actual runtime paths. I'm attaching a patch that I think fixes this, but would be really happy if you could have a look at to verify I'm not screwing something up. -- Regards, Andreas Henriksson Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-09-17 22:04:02 -07:00
Jussi Kivilinna	839c8456fb	add generic size table for qdiscs Patch adds generic size table that is similiar to rate table, with difference that size table stores link layer packet size. Based on patch by Patrick McHardy http://marc.info/?l=linux-netdev&m=115201979221729&w=2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-09-17 21:57:15 -07:00
Patrick McHardy	87953940f9	cls_flow: add perturbation support commit 337628b9aca63fda7622701191d6304c83438909 Author: Patrick McHardy <kaber@trash.net> Date: Fri Jul 4 04:54:56 2008 +0200 cls_flow: add perturbation support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-09-17 21:53:37 -07:00
Stephen Hemminger	5a67f8f9d3	Update to 2.6.27 API The one issue was the old multiqueue API, so that is handled by tc_util.h	2008-09-15 12:05:11 -07:00
Denys Fedoryshchenko	11bbe7fd11	long/ulong iproute-git fix This patch fixes bug in Metadata ematch attributes parser strtoul on error return ULONG_MAX, not LONG_MAX Patch attached as file	2008-07-31 15:25:15 -07:00
Rafael Almeida	b514b3587e	Fixed installation when changing DESTDIR After changing the DESTDIR the installated binaries have some issues due to hard coded paths. For example, using distributions on NetEm would segfault. I've changed iplink.c and tc_util.c so they are now aware of DESTDIR. Along with that change I needed to change the main Makefile so it defines the DESTDIR macro when calling gcc. I also changed the paths so that during the installation sbin, etc, share and lib directories are created directly inside of the DESTDIR, instead of creating a usr directory inside that. That's the behaviour of most packages out there, so I think most users will be expecting that to happen.	2008-07-25 13:40:19 -07:00
Patrick McHardy	ae76106841	tc: don't set protococol field on filter delete > # tc filter show dev eth1 \| grep 4:29:d1 > filter parent 1: protocol ip pref 5 u32 fh 4:29:d1 order 209 key ht 4 > bkt 29 flowid 1:b7aa > > # tc filter del dev eth1 parent 1: pref 5 handle 4:29:d1 u32 > RTNETLINK answers: Invalid argument > We have an error talking to the kernel > > after rollback to package"sys-apps/iproute2-2.6.24.20080108" all > deleted normal... The current iproute version uses "protocol all" by default if its not specified. This is actually only useful for creating new filters, on deletion an unset protocol is treated as wildcard.	2008-06-23 09:09:45 -07:00
Stephen Hemminger	b6da1afc73	ematch related bugfix and cleanup Bugfix: use strtoul rather than strtol for bstrtol to handle large key/mask. Deinline larger functions to save space.	2008-05-29 11:54:19 -07:00
jamal	1750abe2ba	Infrastructure for pretty printing And last for now .. cheers, jamal [PATCH 3/3] [TC/U32] Infrastructure for pretty printing This patch makes it easy to add pretty printers of different protocols. For starters it makes use of ipv4 and raw printers. Add more later ... Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>	2008-05-09 15:50:12 -07:00
jamal	eefcbc7206	Expose the filter protocol makes protocol accessible .. cheers, jamal [PATCH 2/3] [TC/FILTERS] Expose the filter protocol Expose the filter protocol so it can be used by underlying classifiers when they need it. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>	2008-05-09 15:44:46 -07:00
Stephen Hemminger	44dcfe8201	Change formatting of u32 back to default Don't break scripts that depend on previous offset/value format. Introduce a new -pretty flag for decoding, and (gasp) document the formatting arguments.	2008-05-09 15:42:34 -07:00
Patrick McHardy	083a5f00a1	Fix classifier help commit c504ffd627ac211eebf5ed34ef0fbfd7f1dbb347 Author: Patrick McHardy <kaber@trash.net> Date: Wed Mar 26 07:38:43 2008 +0100 [IPROUTE]: Fix classifier help The new check whether the user has specified a protocol makes "ip filter <type> help" fails with "protocol is required". This could be fixed by moving it further down, but a more user-friendly way it to simply use ETH_P_ALL as default if nothing is specified. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-17 10:07:02 -07:00
Jesper Dangaard Brouer	292f29b42c	ATM cell alignment. Introducing the function that does the ATM cell alignment, and modifying tc_calc_rtable() to use this based upon a linklayer parameter. Modified from original to use constants from atm.h and fix all the usages of rtable in same patch. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>	2008-04-17 10:04:31 -07:00
Stephen Hemminger	1a5bd776a2	In police, fix uninitialized "overhead" variable. Bug introduced by myself in an earlier patch series. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>	2008-04-17 09:12:38 -07:00
Jesper Dangaard Brouer	f71f75f39b	police, implement overhead parameter parsing. For police, implement overhead parameter parsing. The change is ABI (Application Binary Interface) backward compatible with older kernels, but will first have effect from kernel 2.6.24. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:27:42 -07:00
Jesper Dangaard Brouer	2a1f78b376	CBQ, doc usage of overhead parameter. CBQ remember to doc usage of overhead parameter. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:27:35 -07:00
Jesper Dangaard Brouer	08fd01843f	CBQ, implement overhead parameter parsing. For CBQ, implement overhead parameter parsing. The change is ABI (Application Binary Interface) backward compatible with older kernels, but will first have effect from kernel 2.6.24. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:27:25 -07:00
Jesper Dangaard Brouer	1db5e2ec13	CBQ use matches() function instead of strcmp(). Change CBQ to use matches() function instead of strcmp(). This resembels the usage in other parse functions, and allows partial command parameter matching. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:27:17 -07:00
Jesper Dangaard Brouer	2c42579f9c	TBF overhead parameter parsing. For TBF, implement overhead parameter parsing. The change is ABI (Application Binary Interface) backward compatible with older kernels, but will first have effect from kernel 2.6.24. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:27:06 -07:00
Mike Frysinger	418a217ad9	Do not strip binaries with `install` Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-04-01 11:26:47 -07:00
Stephen Hemminger	cfa440b0da	missing dport in f_u32 output Small typo from last change to decode filters. Should print dport not port.	2008-02-22 11:51:35 -08:00
Stephen Hemminger	4c9ffc2f8c	decode the output of u32 matches reverse the match offset/mask values into ip header matches.	2008-02-18 11:35:29 -08:00
Stephen Hemminger	e62077d0b6	break excessively long lines Cleanup code (slightly).	2008-02-18 10:51:42 -08:00
Stephen Hemminger	6695297433	Revert "rlim qdisc support" This reverts commit `7ca30b789d`. Since rlim isn't upstream (yet), drop it.	2008-02-18 10:13:25 -08:00
PJ Waskiewicz	e9acc2420c	Update various classifiers' help output for expected CLASSID syntax update: Fix the spelling of "hexidecimal" This updates the help output to specify that CLASSID should be hexidecimal. This makes sure that a user entering "flowid 1:10" gets his flow put into band 15 (0x10) and knows why. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-02-13 12:36:38 -08:00
Stephen Hemminger	84d66882aa	minor typo fixes A couple of obvious typo's.	2008-02-07 22:10:14 -08:00
Stephen Hemminger	de33a43055	Protocol field on tc_filter is required Kernel won't find matching filter if protocol value not provided.	2008-02-07 19:25:26 -08:00
Stephen Hemminger	ba26a6e853	fix typos in help message for meta match Make sure examples actually work.	2008-02-05 12:07:07 -08:00
Stephen Hemminger	5e76a87d4c	Change where vlan option shows up in help Vlan should not be in the socket section	2008-02-05 11:33:44 -08:00
Patrick McHardy	66862d3cc7	cls_flow: add vlan-tag support commit 94e9cba778cb97d77d9146dc3bd38ff195bc2c8a Author: Patrick McHardy <kaber@trash.net> Date: Sat Feb 2 18:22:16 2008 +0100 [IPROUTE]: cls_flow: add vlan-tag support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-02-05 08:36:59 -08:00
Patrick McHardy	9932abb498	Add flow classifier support [IPROUTE]: Add flow classifier support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-01-31 22:28:11 -08:00
Patrick McHardy	5626a24a8b	Add support for SFQ xstats [IPROUTE]: Add support for SFQ xstats Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-01-31 22:28:10 -08:00
Stephen Hemminger	42169181b8	whitespace typo in tc_common.h minor whitespace typo.	2008-01-31 21:26:00 -08:00
Stephen Hemminger	9becb950e9	vlan meta tag match Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2008-01-24 13:16:41 -08:00
Stephen Hemminger	d21dd573e7	Revert "TC action parsing bug fix" [...] > Commands like "tc filter add dev ppp0 parent ffff: protocol ip prio 50 > u32 match ip src 0.0.0.0/0 police rate 4mbit burst 10k drop flowid :1" > apparently no longer works. The flowid is not accepted anymore. > Reverting commit 720a2e8d99... which you authored seems to "fix" this. [...] After further investigation it seems clear to me that reverting the commit 720a2e8d990707749b2... is the correct thing to do, since the real fix for the problem this commit was supposed to fix was instead fixed in commit c29391c7c68f031e246c... Whatever you specify after a u32 police you will now get a syntax error, and according to "tc filter add u32 help" there are several things that you are supposed to be able to specify after a police. This reverts commit `720a2e8d99`.	2008-01-02 09:29:30 -08:00
Stephen Hemminger	4c7abb271b	Merge branch 'master' into net-2.6.25	2007-12-31 12:51:15 -08:00
Denys Fedoryshchenko	53c017880b	iptables compatiablity New iptables 1.4.0 has some library names changed from libipt to libxt. It is prefferable also to open libxt_ first, as newer "style". Signed-off-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2007-12-31 11:15:29 -08:00
Jesper Dangaard Brouer	eeee367d91	Change the rate table calc of transmit cost to use upper bound value. Patrick McHardy, Cite: 'its better to overestimate than underestimate to stay in control of the queue'. Illustrating the rate table array: Legend description rtab[x] : Array index x of rtab[x] xmit_sz : Transmit size contained in rtab[x] (normally transmit time) maps[a-b] : Packet sizes from a to b, will map into rtab[x] Current/old rate table mapping (cell_log:3): rtab[0]:=xmit_sz:0 maps[0-7] rtab[1]:=xmit_sz:8 maps[8-15] rtab[2]:=xmit_sz:16 maps[16-23] rtab[3]:=xmit_sz:24 maps[24-31] rtab[4]:=xmit_sz:32 maps[32-39] rtab[5]:=xmit_sz:40 maps[40-47] rtab[6]:=xmit_sz:48 maps[48-55] New rate table mapping, with kernel cell_align support. rtab[0]:=xmit_sz:8 maps[0-8] rtab[1]:=xmit_sz:16 maps[9-16] rtab[2]:=xmit_sz:24 maps[17-24] rtab[3]:=xmit_sz:32 maps[25-32] rtab[4]:=xmit_sz:40 maps[33-40] rtab[5]:=xmit_sz:48 maps[41-48] rtab[6]:=xmit_sz:56 maps[49-56] New TC util on a kernel WITHOUT support for cell_align rtab[0]:=xmit_sz:8 maps[0-7] rtab[1]:=xmit_sz:16 maps[8-15] rtab[2]:=xmit_sz:24 maps[16-23] rtab[3]:=xmit_sz:32 maps[24-31] rtab[4]:=xmit_sz:40 maps[32-39] rtab[5]:=xmit_sz:48 maps[40-47] rtab[6]:=xmit_sz:56 maps[48-55] Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2007-12-31 11:08:08 -08:00
Jesper Dangaard Brouer	d5f46f9cc3	Cleanup: tc_calc_rtable(). Change tc_calc_rtable() to take a tc_ratespec struct as an argument. (cell_log still needs to be passed on as a parameter, because -1 indicate that the cell_log needs to be computed by the function.). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2007-12-31 11:08:04 -08:00
Jesper Dangaard Brouer	bccd014b86	Overhead calculation is now done in the kernel. The only current user is HTB. HTB overhead argument is now passed on to the kernel (in the struct tc_ratespec). Also correct the data types. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2007-12-31 11:07:58 -08:00
Stephen Hemminger	6b1ac654e9	add decode of match rules Show ip address etc when decoding output of tc filter show Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>	2007-12-31 10:29:52 -08:00
Stephen Hemminger	c1b81cb5fe	netem potential dist table overflow Fix possible stack overflow when given distribution table that is too large. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-12-12 15:02:51 -08:00
Stephen Hemminger	e50e9f9123	Merge branch 'master' into net-2.6.25	2007-12-11 10:04:33 -08:00
François Delawarde	e22b42a2c1	tc mask patch Hello Stephen, As the current maintainer of iproute2 package, you could be interested in including the attached patch that allow using masks in the fw filter of the tc utility (very useful at least for me). AFAK, it works at least from iproute2 version 2.6.20-?. Feel free to make the appropriate cleaning changes if necessary, or contact me if you see any trouble. Best regards, François Delawarde. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-12-11 09:35:49 -08:00
Herbert Xu	fc2d02069b	Add NAT action Here's a patch to add support for the nat action which is now in the kernel. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-12-11 09:33:55 -08:00
Stephen Hemminger	7ca30b789d	rlim qdisc support Add support for new rate limit qdisc Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-12-10 13:10:20 -08:00
Stephen Hemminger	ece02ea0a3	Fix breakage from netfilter/ip_tables header change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-12-10 09:40:45 -08:00
Stephen Hemminger	45305c2470	add q_rr to tc Makefile Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-10-16 14:27:42 -07:00
Andreas Henriksson	64e2ad593b	Also do tc_core_time2big argument (long->unsigned). tc_core_time2big only used in tc/q_netem.c where it gets passed an unsigned. Signed-off-by: Andreas Henriksson <andreas@fatal.se> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-10-12 16:06:22 -07:00
Andreas Henriksson	57a800d45a	Switch helpers tc_core_{time2ktime,ktime2time} from long to unsigned as well. Follow up patch to "Fix overflow in time2tick / tick2time." which switches the remaining two helper functions from long to unsigned as well. These functions are only used in "tc/q_hfsc.c" where both the passed argument and the place the return value is stored are unsigned/u32 variables, so this change should be safe to make but hasn't been tested as extensively as the time2tick patch. Signed-off-by: Andreas Henriksson <andreas@fatal.se> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-10-12 16:06:21 -07:00
Andreas Henriksson	4475984498	Fix overflow in time2tick / tick2time. The helper functions gets passed an unsigned int, which gets cast to long and overflows. See http://bugs.debian.org/175462 Signed-off-by: Andreas Henriksson <andreas@fatal.se> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-10-12 14:56:35 -07:00
Lionel Elie Mamane	bc45ded42c	Fix ematch cmp and nbyte syntax help text. The help/usage screen of ematch cmp and nbyte say recognised symbolic values for "layer FOO" are link, header and next-header, but the code does _not_ implement that: it will recognise "next-header" as what is supposed to be "header" and will not recognise "header". The right symbolic values seem to be link, network, transport. Here is a patch that changes the help/usage screen to match the code. (http://bugs.debian.org/438653) Signed-off-by: Andreas Henriksson <andreas@fatal.se> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-10-12 14:56:31 -07:00
Patrick McHardy	6140785236	Fix meta ematch usage of 0 values em_meta doesn't send 0 values to the kernel. breaking matching on them and resulting in "Missing value TLV" messages on dump. Signed-off-by: Patrick McHardy <kaber@trash.net>	2007-08-22 10:52:13 -07:00
Stephen Hemminger	f7cd9b0354	Fix m_ipt build Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-08-22 10:33:33 -07:00
PJ Waskiewicz	292ce96bca	iproute2: sch_rr support in tc This patch applies on top of Patrick McHardy's RTNETLINK patches to add nested compat attributes. This is needed to maintain ABI for sch_{rr\|prio} in the kernel with respect to tc. A new option, namely multiqueue, was added to sch_prio and sch_rr. This will allow a user to turn multiqueue support on for sch_prio or sch_rr at loadtime. Also, tc qdisc ls will display whether or not multiqueue is enabled on that qdisc. When in multiqueue mode, a user can specify a value of 0 for bands, and the number of bands will be created to match the number of queues on the device. This patch is to support the new sch_rr (round-robin) qdisc being proposed in NET for multiqueue network device support in the Linux network stack. It uses q_prio.c as the template, since the qdiscs are nearly identical, outside of the ->dequeue() routine. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>	2007-08-22 10:04:25 -07:00
Patrick McHardy	c29391c7c6	Bug fix tc action drop >>That command is from a script that used to work with iproute2-ss020116 >>(2002!), which had the following in tc/m_police.c: >> >>210 } else if (strcmp(argv, "action") == 0) { >>211 NEXT_ARG(); >>212 if (get_police_result(&p.action, &presult, argv)) { >> >>I don't know when that bit was dropped, but it used to be there. :-) > > > > Indeed, I missed that. I'll fix up the patch .. OK this patch fixes parsing of "action ...". I've removed the erroring on unknown arguments again since in that case the caller should continue parsing.	2007-08-22 10:01:10 -07:00
Patrick McHardy	720a2e8d99	TC action parsing bug fix > > Is it a bug that: > > # tc filter add dev eth0 parent 1: protocol ip prio 0 handle 0xfffffff > fw police rate 1 burst 1 mpu 0 mtu 1 action drop > ^^^^^^^^^^^ > creates a filter that looks like: > > # tc filter ls dev eth0 > filter parent 1: protocol ip pref 49152 fw > filter parent 1: protocol ip pref 49152 fw handle 0xfffffff police 0x1 > rate 0bit burst 0b mtu 1b action reclassify > ^^^^^^^^^^^^^^^^^ > ref -543190236 bind 4 > > (which reclassifies and thus lets 0xfffffff-marked packets through). > > I'm pretty sure this used to work under 2.4.x (though I no longer have a > 2.4 box to test with), but it hasn't worked on any of the 2.6.x kernels > I've tried (with both iproute2-ss060323 and 070710). Good catch. It seems this is merely a parsing error, iproute doesn't have an "action" parameter and aborts parsing, so it uses the default value of "RECLASSIFY". It never had this parameter, so this patch removes it from the help text and makes it return an error.	2007-08-22 10:00:41 -07:00
Stephen Hemminger	954df8c66f	Snapshot update for 2.6.22 Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-06-25 09:42:30 -07:00
Stephen Hemminger	aa27f88c84	Add TC_LIB_DIR environment variable. Don't hardcode /usr/lib/tc as a path Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-06-20 15:31:40 -07:00
Stephen Hemminger	30af998941	netem: static Make netem static rather than shared library. It saves problems on 64 bit platforms. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-06-20 15:20:22 -07:00
Patrick McHardy	c6ab5b8247	[Fwd: Re: more iproute2 issues (not critical)] This one also makes sense for the release I guess. -------- Original Message -------- Subject: Re: more iproute2 issues (not critical) Date: Sat, 31 Mar 2007 16:16:56 +0200 From: Patrick McHardy <kaber@trash.net> To: Denys <denys@visp.net.lb> CC: Stephen Hemminger <shemminger@linux-foundation.org>, netdev@vger.kernel.org References: <20070321175951.M73913@visp.net.lb> <46026717.9060909@trash.net> <20070322124533.M79867@visp.net.lb> <46027FF2.6020001@trash.net> <20070322101224.3e6bb899@freekitty> <20070331021401.M17326@visp.net.lb> <20070331023011.M8101@visp.net.lb> Denys wrote: > Ooops, sorry, it seems my fault, no library exist on this system. > But i guess it must not coredump in this case? Is it possible to check if > library not exist and just print some nice message? > It is trivial i guess. The problem is that lib_dir is NULL when calling get_target_names. This patch fixes it. [IPROUTE]: m_ipt: fix crash when dumping rules lib_dir is NULL when calling get_target_name, causing a NULL pointer dereference in the strlen call. Signed-off-by: Patrick McHardy <kaber@trash.net>	2007-06-20 10:52:22 -07:00
Thomas Graf	dcb283c300	iproute2: Support IFF_LOWER_UP and IFF_DORMANT In order to support these new flags add current linux/if.h into the directory with the local copies. This caused troubles with outdated redefinitions from net/if.h so I've removed the dependency on it. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-06-19 16:40:40 -07:00
Stephen Hemminger	891514473b	Revert "Increase internal clock resolution to nsec" This reverts `fd784ccaf6` commit. Thanks Stephen, but actually I think the last patch (increase clock resolution) shouldn't go in yet. I'm not done yet looking at all the compatibility issues and it does change the range of valid values for everything dealing with times. Most places I looked at still accept reasonable ranges, but I would feel more comfortable to make sure everything is fine first.	2007-03-14 10:14:07 -07:00
jamal	9aa446896e	Old bug on tc > It is in current git tree. A small fix attached after some testing. Please dont forget to apply my other patches. When you have them let me know so i can do some more testing. cheers, jamal [TC] Get iptables path selection to set correct path A small tweak on top of Stephens patch Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:43:24 -07:00
Patrick McHardy	fd784ccaf6	Increase internal clock resolution to nsec [IPROUTE]: Increase internal clock resolution to nsec Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:20 -07:00
Patrick McHardy	147e1d4b5a	Handle different kernel clock resolutions [IPROUTE]: Handle different kernel clock resolutions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:19 -07:00
Patrick McHardy	bd29e35d9d	Add sprint_ticks() function and use in CBQ [IPROUTE]: Add sprint_ticks() function and use in CBQ Add helper function to print ticks to avoid assumptions about clock resolution in CBQ. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:18 -07:00
Patrick McHardy	8f34caafbd	Replace "usec" by "time" in function names [IPROUTE]: Replace "usec" by "time" in function names Rename functions containing "usec" since they don't necessarily return usec units anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:17 -07:00
Patrick McHardy	f0bda7e5a5	Introduce TIME_UNITS_PER_SEC to represent internal clock resolution [IPROUTE]: Introduce TIME_UNITS_PER_SEC to represent internal clock resolution Introduce TIME_UNITS_PER_SEC and conversion functions between internal resolution and resolution expected by the kernel (currently implemented as NOPs, only needed by HFSC, which currently always uses microseconds). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:16 -07:00
Patrick McHardy	76dc0aa28f	Introduce tc_calc_xmitsize and use where appropriate [IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate Add tc_calc_xmitsize() as complement to tc_calc_xmittime(), which calculates the size that can be transmitted at a given rate during a given time. Replace all expressions of the form "size = rate*tc_core_tick2usec(time))/1000000" by tc_calc_xmitsize() calls. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-03-13 14:42:15 -07:00

... 7 8 9 10 11 ...

1035 Commits