Add missing files.

This commit is contained in:
shemminger 2006-01-10 18:50:18 +00:00
parent d8a45819b2
commit 143969f24b
3 changed files with 1066 additions and 0 deletions

254
doc/actions/actions-general Normal file
View File

@ -0,0 +1,254 @@
This documented is slightly dated but should give you idea of how things
work.
What is it?
-----------
An extension to the filtering/classification architecture of Linux Traffic
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
i.e you could say something like:
-----
tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k
-----
which implies "if a packet is seen on the ingress of the lo device with
a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and
we execute a policing action which rate limits its bandwidth utilization
to 1.5Mbps".
The new extensions allow for more than just policing actions to be added.
They are also fully backward compatible. If you have a kernel that doesnt
understand them, then the effect is null i.e if you have a newer tc
but older kernel, the actions are not installed. Likewise if you
have a newer kernel but older tc, obviously the tc will use current
syntax which will work fine. Of course to get the required effect you need
both newer tc and kernel. If you are reading this you have the
right tc ;->
A side effect is that we can now get stateless firewalling to work with tc.
Essentially this is now an alternative to iptables.
I wont go into details of my dislike for iptables at times, but
scalability is one of the main issues; however, if you need stateful
classification - use netfilter (for now).
This stuff works on both ingress and egress qdiscs.
Features
--------
1) new additional syntax and actions enabled. Note old syntax is still valid.
Essentially this is still the same syntax as tc with a new construct
"action". The syntax is of the form:
tc filter add <DEVICE> parent 1:0 protocol ip prio 10 <Filter description>
flowid 1:1 action <ACTION description>*
You can have as many actions as you want (within sensible reasoning).
In the past the only real action was the policer; i.e you could do something
along the lines of:
tc filter add dev lo parent ffff: protocol ip prio 10 u32 \
match ip src 127.0.0.1/32 flowid 1:1 \
police mtu 4000 rate 1500kbit burst 90k
Although you can still use the same syntax, now you can say:
tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
match ip src 127.0.0.1/32 flowid 1:1 \
action police mtu 4000 rate 1500kbit burst 90k
" generic Actions" (gact) at the moment are:
{ drop, pass, reclassify, continue}
(If you have others, no listed here give me a reason and we will add them)
+drop says to drop the packet
+pass says to accept it
+reclassify requests for reclassification of the packet
+continue requests for next lookup to match
2)In order to take advantage of some of the targets written by the
iptables people, a classifier can have a packet being massaged by an
iptable target. I have only tested with mangler targets up to now.
(infact anything that is not in the mangling table is disabled right now)
In terms of hooks:
*ingress is mapped to pre-routing hook
*egress is mapped to post-routing hook
I dont see much value in the other hooks, if you see it and email me good
reasons, the addition is trivial.
Example syntax for iptables targets usage becomes:
tc filter add ..... u32 <u32 syntax> action ipt -j <iptables target syntax>
example:
tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:12 \
action ipt -j mark --set-mark 2
3) A feature i call pipe
The motivation is derived from Unix pipe mechanism but applied to packets.
Essentially take a matching packet and pass it through
action1 | action2 | action3 etc.
You could do something similar to this with the tc policer and the "continue"
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
as an example -- and please note that this is just an example _not_ The
Word Youve Been Waiting For (yes i have had problems giving examples
which ended becoming dogma in documents and people modifying them a little
to look clever);
i selected the metering rates to be small so that i can show better how
things work.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesnt exceed rate, this is where we terminate action execution.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
then passed through a second meter.
-The second meter is shared across all flows on that device [i am suprised
that this seems to be not a well know feature of the policer; Bert was telling
me that someone was writing a qdisc just to do sharing across multiple devices;
it must be the summer heat again; weve had someone doing that every year around
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
share.]
-If the second meter is exceeded the color of the flow changes further to 3.
-We then pass the packet to another meter which is shared across all devices
in the system. If this meter is exceeded we drop the packet.
Note the mark can be used further up the system to do things like policy
or more interesting things on the egress.
------------------ cut here -------------------------------
#
# Add an ingress qdisc on eth0
tc qdisc add dev eth0 ingress
#
#if you see an incoming packet from 10.0.0.21
tc filter add dev eth0 parent ffff: protocol ip prio 1 \
u32 match ip src 10.0.0.21/32 flowid 1:15 \
#
# first give it a mark of 1
action ipt -j mark --set-mark 1 index 2 \
#
# then pass it through a policer which allows 1kbps; if the flow
# doesnt exceed that rate, this is where we stop, if it exceeds we
# pipe the packet to the next action
action police rate 1kbit burst 9k pipe \
#
# which marks the packet fwmark as 2 and pipes
action ipt -j mark --set-mark 2 \
#
# next attempt to borrow b/width from a meter
# used across all flows incoming on eth0("index 30")
# and if that is exceeded we pipe to the next action
action police index 30 mtu 5000 rate 1kbit burst 10k pipe \
# mark it as fwmark 3 if exceeded
action ipt -j mark --set-mark 3 \
# and then attempt to borrow from a meter used by all devices in the
# system. Should this be exceeded, drop the packet on the floor.
action police index 20 mtu 5000 rate 1kbit burst 90k drop
---------------------------------
Now lets see the actions installed with
"tc filter show parent ffff: dev eth0"
-------- output -----------
jroot# tc filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
match 0a000015/ffffffff at 12
-------------------------------
Note the ordering of the actions is based on the order in which we entered
them. In the future i will add explicit priorities.
Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after
you see a few lines of dots
----
[root@jzny hadi]# ping -f 10.0.0.22
PING 10.0.0.22 (10.0.0.22): 56 data bytes
....................................................................................................................................................................................................................................................................................................................................................................................................................................................
--- 10.0.0.22 ping statistics ---
2248 packets transmitted, 1811 packets received, 19% packet loss
round-trip min/avg/max = 0.7/9.3/20.1 ms
-----------------------------
Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0"
--------------
jroot# tc -s filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
5
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
match 0a000015/ffffffff at 12
-------------------------------
Neat, eh?
Wanna write an action module?
------------------------------
Its easy. Either look at the code or send me email. I will document at
some point; will also accept documentation.
TODO
----
Lotsa goodies/features coming. Requests also being accepted.
At the moment the focus has been on getting the architecture in place.
Expect new things in the spurious time i have to work on this
(particularly around end of year when i have typically get time off
from work).

155
doc/actions/dummy-README Normal file
View File

@ -0,0 +1,155 @@
Advantage over current IMQ; cleaner in particular in in SMP;
with a _lot_ less code.
Old Dummy device functionality is preserved while new one only
kicks in if you use actions.
IMQ USES
--------
As far as i know the reasons listed below is why people use IMQ.
It would be nice to know of anything else that i missed.
1) qdiscs/policies that are per device as opposed to system wide.
IMQ allows for sharing.
2) Allows for queueing incoming traffic for shaping instead of
dropping. I am not aware of any study that shows policing is
worse than shaping in achieving the end goal of rate control.
I would be interested if anyone is experimenting.
3) Very interesting use: if you are serving p2p you may wanna give
preference to your own localy originated traffic (when responses come back)
vs someone using your system to do bittorent. So QoSing based on state
comes in as the solution. What people did to achive this was stick
the IMQ somewhere prelocal hook.
I think this is a pretty neat feature to have in Linux in general.
(i.e not just for IMQ).
But i wont go back to putting netfilter hooks in the device to satisfy
this. I also dont think its worth it hacking dummy some more to be
aware of say L3 info and play ip rule tricks to achieve this.
--> Instead the plan is to have a contrack related action. This action will
selectively either query/create contrack state on incoming packets.
Packets could then be redirected to dummy based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didnt have existing state. This
all however is dependent on whatever rules the admin enters.
At the moment this function does not exist yet. I have decided instead
of sitting on the patch to release it and then if theres pressure i will
add this feature.
What you can do with dummy currently with actions
--------------------------------------------------
Lets say you are policing packets from alias 192.168.200.200/32
you dont want those to exceed 100kbps going out.
tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \
match ip src 192.168.200.200/32 flowid 1:2 \
action police rate 100kbit burst 90k drop
If you run tcpdump on eth0 you will see all packets going out
with src 192.168.200.200/32 dropped or not
Extend the rule a little to see only the ones that made it out:
tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \
match ip src 192.168.200.200/32 flowid 1:2 \
action police rate 10kbit burst 90k drop \
action mirred egress mirror dev dummy0
Now fire tcpdump on dummy0 to see only those packets ..
tcpdump -n -i dummy0 -x -e -t
Essentially a good debugging/logging interface.
If you replace mirror with redirect, those packets will be
blackholed and will never make it out. This redirect behavior
changes with new patch (but not the mirror).
What you can do with the patch to provide functionality
that most people use IMQ for below:
--------
export TC="/sbin/tc"
$TC qdisc add dev dummy0 root handle 1: prio
$TC qdisc add dev dummy0 parent 1:1 handle 10: sfq
$TC qdisc add dev dummy0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
$TC qdisc add dev dummy0 parent 1:3 handle 30: sfq
$TC filter add dev dummy0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1
$TC filter add dev dummy0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2
ifconfig dummy0 up
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving in eth0 to dummy0
# use mark 1 --> puts them onto class 1:1
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 \
action ipt -j MARK --set-mark 1 \
action mirred egress redirect dev dummy0
--------
Run A Little test:
from another machine ping so that you have packets going into the box:
-----
[root@jzny action-tests]# ping 10.22
PING 10.22 (10.0.0.22): 56 data bytes
64 bytes from 10.0.0.22: icmp_seq=0 ttl=64 time=2.8 ms
64 bytes from 10.0.0.22: icmp_seq=1 ttl=64 time=0.6 ms
64 bytes from 10.0.0.22: icmp_seq=2 ttl=64 time=0.6 ms
--- 10.22 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/1.3/2.8 ms
[root@jzny action-tests]#
-----
Now look at some stats:
---
[root@jmandrake]:~# $TC -s filter show parent ffff: dev eth0
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
match 00000000/00000000 at 0
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 2: mirred (Egress Redirect to device dummy0) stolen
index 1 ref 1 bind 1 installed 165 sec used 27 sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# $TC -s qdisc
qdisc sfq 30: dev dummy0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev dummy0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev dummy0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc prio 1: dev dummy0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# ifconfig dummy0
dummy0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:6 errors:0 dropped:3 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:504 (504.0 b) TX bytes:252 (252.0 b)
-----
Dummy continues to behave like it always did.
You send it any packet not originating from the actions it will drop them.
[In this case the three dropped packets were ipv6 ndisc].
cheers,
jamal

657
ip/ipntable.c Normal file
View File

@ -0,0 +1,657 @@
/*
* Copyright (C)2006 USAGI/WIDE Project
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* based on ipneigh.c
*/
/*
* Authors:
* Masahide NAKAMURA @USAGI
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#include "utils.h"
#include "ip_common.h"
static struct
{
int family;
int index;
#define NONE_DEV (-1)
char name[1024];
} filter;
static void usage(void) __attribute__((noreturn));
static void usage(void)
{
fprintf(stderr,
"Usage: ip ntable change name NAME [ dev DEV ]\n"
" [ thresh1 VAL ] [ thresh2 VAL ] [ thresh3 VAL ] [ gc_int MSEC ]\n"
" [ PARMS ]\n"
"Usage: ip ntable show [ dev DEV ] [ name NAME ]\n"
"PARMS := [ base_reachable MSEC ] [ retrans MSEC ] [ gc_stale MSEC ]\n"
" [ delay_probe MSEC ] [ queue LEN ]\n"
" [ app_probs VAL ] [ ucast_probes VAL ] [ mcast_probes VAL ]\n"
" [ anycast_delay MSEC ] [ proxy_delay MSEC ] [ proxy_queue LEN ]\n"
" [ locktime MSEC ]\n"
);
exit(-1);
}
static int ipntable_modify(int cmd, int flags, int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ndtmsg ndtm;
char buf[1024];
} req;
char *namep = NULL;
char *threshsp = NULL;
char *gc_intp = NULL;
char parms_buf[1024];
struct rtattr *parms_rta = (struct rtattr *)parms_buf;
int parms_change = 0;
memset(&req, 0, sizeof(req));
req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndtmsg));
req.n.nlmsg_flags = NLM_F_REQUEST|flags;
req.n.nlmsg_type = cmd;
req.ndtm.ndtm_family = preferred_family;
req.ndtm.ndtm_pad1 = 0;
req.ndtm.ndtm_pad2 = 0;
memset(&parms_buf, 0, sizeof(parms_buf));
parms_rta->rta_type = NDTA_PARMS;
parms_rta->rta_len = RTA_LENGTH(0);
while (argc > 0) {
if (strcmp(*argv, "name") == 0) {
int len;
NEXT_ARG();
if (namep)
duparg("NAME", *argv);
namep = *argv;
len = strlen(namep) + 1;
addattr_l(&req.n, sizeof(req), NDTA_NAME, namep, len);
} else if (strcmp(*argv, "thresh1") == 0) {
__u32 thresh1;
NEXT_ARG();
threshsp = *argv;
if (get_u32(&thresh1, *argv, 0))
invarg("\"thresh1\" value is invalid", *argv);
addattr32(&req.n, sizeof(req), NDTA_THRESH1, thresh1);
} else if (strcmp(*argv, "thresh2") == 0) {
__u32 thresh2;
NEXT_ARG();
threshsp = *argv;
if (get_u32(&thresh2, *argv, 0))
invarg("\"thresh2\" value is invalid", *argv);
addattr32(&req.n, sizeof(req), NDTA_THRESH2, thresh2);
} else if (strcmp(*argv, "thresh3") == 0) {
__u32 thresh3;
NEXT_ARG();
threshsp = *argv;
if (get_u32(&thresh3, *argv, 0))
invarg("\"thresh3\" value is invalid", *argv);
addattr32(&req.n, sizeof(req), NDTA_THRESH3, thresh3);
} else if (strcmp(*argv, "gc_int") == 0) {
__u64 gc_int;
NEXT_ARG();
gc_intp = *argv;
if (get_u64(&gc_int, *argv, 0))
invarg("\"gc_int\" value is invalid", *argv);
addattr_l(&req.n, sizeof(req), NDTA_GC_INTERVAL,
&gc_int, sizeof(gc_int));
} else if (strcmp(*argv, "dev") == 0) {
__u32 ifindex;
NEXT_ARG();
ifindex = ll_name_to_index(*argv);
if (ifindex == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n", *argv);
return -1;
}
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_IFINDEX, ifindex);
} else if (strcmp(*argv, "base_reachable") == 0) {
__u64 breachable;
NEXT_ARG();
if (get_u64(&breachable, *argv, 0))
invarg("\"base_reachable\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_BASE_REACHABLE_TIME,
&breachable, sizeof(breachable));
parms_change = 1;
} else if (strcmp(*argv, "retrans") == 0) {
__u64 retrans;
NEXT_ARG();
if (get_u64(&retrans, *argv, 0))
invarg("\"retrans\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_RETRANS_TIME,
&retrans, sizeof(retrans));
parms_change = 1;
} else if (strcmp(*argv, "gc_stale") == 0) {
__u64 gc_stale;
NEXT_ARG();
if (get_u64(&gc_stale, *argv, 0))
invarg("\"gc_stale\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_GC_STALETIME,
&gc_stale, sizeof(gc_stale));
parms_change = 1;
} else if (strcmp(*argv, "delay_probe") == 0) {
__u64 delay_probe;
NEXT_ARG();
if (get_u64(&delay_probe, *argv, 0))
invarg("\"delay_probe\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_DELAY_PROBE_TIME,
&delay_probe, sizeof(delay_probe));
parms_change = 1;
} else if (strcmp(*argv, "queue") == 0) {
__u32 queue;
NEXT_ARG();
if (get_u32(&queue, *argv, 0))
invarg("\"queue\" value is invalid", *argv);
if (!parms_rta)
parms_rta = (struct rtattr *)&parms_buf;
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_QUEUE_LEN, queue);
parms_change = 1;
} else if (strcmp(*argv, "app_probes") == 0) {
__u32 aprobe;
NEXT_ARG();
if (get_u32(&aprobe, *argv, 0))
invarg("\"app_probes\" value is invalid", *argv);
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_APP_PROBES, aprobe);
parms_change = 1;
} else if (strcmp(*argv, "ucast_probes") == 0) {
__u32 uprobe;
NEXT_ARG();
if (get_u32(&uprobe, *argv, 0))
invarg("\"ucast_probes\" value is invalid", *argv);
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_UCAST_PROBES, uprobe);
parms_change = 1;
} else if (strcmp(*argv, "mcast_probes") == 0) {
__u32 mprobe;
NEXT_ARG();
if (get_u32(&mprobe, *argv, 0))
invarg("\"mcast_probes\" value is invalid", *argv);
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_MCAST_PROBES, mprobe);
parms_change = 1;
} else if (strcmp(*argv, "anycast_delay") == 0) {
__u64 anycast_delay;
NEXT_ARG();
if (get_u64(&anycast_delay, *argv, 0))
invarg("\"anycast_delay\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_ANYCAST_DELAY,
&anycast_delay, sizeof(anycast_delay));
parms_change = 1;
} else if (strcmp(*argv, "proxy_delay") == 0) {
__u64 proxy_delay;
NEXT_ARG();
if (get_u64(&proxy_delay, *argv, 0))
invarg("\"proxy_delay\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_PROXY_DELAY,
&proxy_delay, sizeof(proxy_delay));
parms_change = 1;
} else if (strcmp(*argv, "proxy_queue") == 0) {
__u32 pqueue;
NEXT_ARG();
if (get_u32(&pqueue, *argv, 0))
invarg("\"proxy_queue\" value is invalid", *argv);
rta_addattr32(parms_rta, sizeof(parms_buf),
NDTPA_PROXY_QLEN, pqueue);
parms_change = 1;
} else if (strcmp(*argv, "locktime") == 0) {
__u64 locktime;
NEXT_ARG();
if (get_u64(&locktime, *argv, 0))
invarg("\"locktime\" value is invalid", *argv);
rta_addattr_l(parms_rta, sizeof(parms_buf),
NDTPA_LOCKTIME,
&locktime, sizeof(locktime));
parms_change = 1;
} else {
invarg("unknown", *argv);
}
argc--; argv++;
}
if (!namep)
missarg("NAME");
if (!threshsp && !gc_intp && !parms_change) {
fprintf(stderr, "Not enough information: changable attributes required.\n");
exit(-1);
}
if (parms_rta->rta_len > RTA_LENGTH(0)) {
addattr_l(&req.n, sizeof(req), NDTA_PARMS, RTA_DATA(parms_rta),
RTA_PAYLOAD(parms_rta));
}
if (rtnl_talk(&rth, &req.n, 0, 0, NULL, NULL, NULL) < 0)
exit(2);
return 0;
}
static const char *ntable_strtime_delta(__u32 msec)
{
static char str[32];
struct timeval now;
time_t t;
struct tm *tp;
if (msec == 0)
goto error;
memset(&now, 0, sizeof(now));
if (gettimeofday(&now, NULL) < 0) {
perror("gettimeofday");
goto error;
}
t = now.tv_sec - (msec / 1000);
tp = localtime(&t);
if (!tp)
goto error;
strftime(str, sizeof(str), "%Y-%m-%d %T", tp);
return str;
error:
strcpy(str, "(error)");
return str;
}
int print_ntable(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
{
FILE *fp = (FILE*)arg;
struct ndtmsg *ndtm = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[NDTA_MAX+1];
struct rtattr *tpb[NDTPA_MAX+1];
int ret;
if (n->nlmsg_type != RTM_NEWNEIGHTBL) {
fprintf(stderr, "Not NEIGHTBL: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
}
len -= NLMSG_LENGTH(sizeof(*ndtm));
if (len < 0) {
fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
return -1;
}
if (preferred_family && preferred_family != ndtm->ndtm_family)
return 0;
parse_rtattr(tb, NDTA_MAX, NDTA_RTA(ndtm),
n->nlmsg_len - NLMSG_LENGTH(sizeof(*ndtm)));
if (tb[NDTA_NAME]) {
char *name = RTA_DATA(tb[NDTA_NAME]);
if (strlen(filter.name) > 0 && strcmp(filter.name, name))
return 0;
}
if (tb[NDTA_PARMS]) {
parse_rtattr(tpb, NDTPA_MAX, RTA_DATA(tb[NDTA_PARMS]),
RTA_PAYLOAD(tb[NDTA_PARMS]));
if (tpb[NDTPA_IFINDEX]) {
__u32 ifindex = *(__u32 *)RTA_DATA(tpb[NDTPA_IFINDEX]);
if (filter.index && filter.index != ifindex)
return 0;
} else {
if (filter.index && filter.index != NONE_DEV)
return 0;
}
}
if (ndtm->ndtm_family == AF_INET)
fprintf(fp, "inet ");
else if (ndtm->ndtm_family == AF_INET6)
fprintf(fp, "inet6 ");
else if (ndtm->ndtm_family == AF_DECnet)
fprintf(fp, "dnet ");
else
fprintf(fp, "(%d) ", ndtm->ndtm_family);
if (tb[NDTA_NAME]) {
char *name = RTA_DATA(tb[NDTA_NAME]);
fprintf(fp, "%s ", name);
}
fprintf(fp, "%s", _SL_);
ret = (tb[NDTA_THRESH1] || tb[NDTA_THRESH2] || tb[NDTA_THRESH3] ||
tb[NDTA_GC_INTERVAL]);
if (ret)
fprintf(fp, " ");
if (tb[NDTA_THRESH1]) {
__u32 thresh1 = *(__u32 *)RTA_DATA(tb[NDTA_THRESH1]);
fprintf(fp, "thresh1 %u ", thresh1);
}
if (tb[NDTA_THRESH2]) {
__u32 thresh2 = *(__u32 *)RTA_DATA(tb[NDTA_THRESH2]);
fprintf(fp, "thresh2 %u ", thresh2);
}
if (tb[NDTA_THRESH3]) {
__u32 thresh3 = *(__u32 *)RTA_DATA(tb[NDTA_THRESH3]);
fprintf(fp, "thresh3 %u ", thresh3);
}
if (tb[NDTA_GC_INTERVAL]) {
__u64 gc_int = *(__u64 *)RTA_DATA(tb[NDTA_GC_INTERVAL]);
fprintf(fp, "gc_int %llu ", gc_int);
}
if (ret)
fprintf(fp, "%s", _SL_);
if (tb[NDTA_CONFIG] && show_stats) {
struct ndt_config *ndtc = RTA_DATA(tb[NDTA_CONFIG]);
fprintf(fp, " ");
fprintf(fp, "config ");
fprintf(fp, "key_len %u ", ndtc->ndtc_key_len);
fprintf(fp, "entry_size %u ", ndtc->ndtc_entry_size);
fprintf(fp, "entries %u ", ndtc->ndtc_entries);
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
fprintf(fp, "last_flush %s ",
ntable_strtime_delta(ndtc->ndtc_last_flush));
fprintf(fp, "last_rand %s ",
ntable_strtime_delta(ndtc->ndtc_last_rand));
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
fprintf(fp, "hash_rnd %u ", ndtc->ndtc_hash_rnd);
fprintf(fp, "hash_mask %08x ", ndtc->ndtc_hash_mask);
fprintf(fp, "hash_chain_gc %u ", ndtc->ndtc_hash_chain_gc);
fprintf(fp, "proxy_qlen %u ", ndtc->ndtc_proxy_qlen);
fprintf(fp, "%s", _SL_);
}
if (tb[NDTA_PARMS]) {
if (tpb[NDTPA_IFINDEX]) {
__u32 ifindex = *(__u32 *)RTA_DATA(tpb[NDTPA_IFINDEX]);
fprintf(fp, " ");
fprintf(fp, "dev %s ", ll_index_to_name(ifindex));
fprintf(fp, "%s", _SL_);
}
fprintf(fp, " ");
if (tpb[NDTPA_REFCNT]) {
__u32 refcnt = *(__u32 *)RTA_DATA(tpb[NDTPA_REFCNT]);
fprintf(fp, "refcnt %u ", refcnt);
}
if (tpb[NDTPA_REACHABLE_TIME]) {
__u64 reachable = *(__u64 *)RTA_DATA(tpb[NDTPA_REACHABLE_TIME]);
fprintf(fp, "reachable %llu ", reachable);
}
if (tpb[NDTPA_BASE_REACHABLE_TIME]) {
__u64 breachable = *(__u64 *)RTA_DATA(tpb[NDTPA_BASE_REACHABLE_TIME]);
fprintf(fp, "base_reachable %llu ", breachable);
}
if (tpb[NDTPA_RETRANS_TIME]) {
__u64 retrans = *(__u64 *)RTA_DATA(tpb[NDTPA_RETRANS_TIME]);
fprintf(fp, "retrans %llu ", retrans);
}
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
if (tpb[NDTPA_GC_STALETIME]) {
__u64 gc_stale = *(__u64 *)RTA_DATA(tpb[NDTPA_GC_STALETIME]);
fprintf(fp, "gc_stale %llu ", gc_stale);
}
if (tpb[NDTPA_DELAY_PROBE_TIME]) {
__u64 delay_probe = *(__u64 *)RTA_DATA(tpb[NDTPA_DELAY_PROBE_TIME]);
fprintf(fp, "delay_probe %llu ", delay_probe);
}
if (tpb[NDTPA_QUEUE_LEN]) {
__u32 queue = *(__u32 *)RTA_DATA(tpb[NDTPA_QUEUE_LEN]);
fprintf(fp, "queue %u ", queue);
}
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
if (tpb[NDTPA_APP_PROBES]) {
__u32 aprobe = *(__u32 *)RTA_DATA(tpb[NDTPA_APP_PROBES]);
fprintf(fp, "app_probes %u ", aprobe);
}
if (tpb[NDTPA_UCAST_PROBES]) {
__u32 uprobe = *(__u32 *)RTA_DATA(tpb[NDTPA_UCAST_PROBES]);
fprintf(fp, "ucast_probes %u ", uprobe);
}
if (tpb[NDTPA_MCAST_PROBES]) {
__u32 mprobe = *(__u32 *)RTA_DATA(tpb[NDTPA_MCAST_PROBES]);
fprintf(fp, "mcast_probes %u ", mprobe);
}
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
if (tpb[NDTPA_ANYCAST_DELAY]) {
__u64 anycast_delay = *(__u64 *)RTA_DATA(tpb[NDTPA_ANYCAST_DELAY]);
fprintf(fp, "anycast_delay %llu ", anycast_delay);
}
if (tpb[NDTPA_PROXY_DELAY]) {
__u64 proxy_delay = *(__u64 *)RTA_DATA(tpb[NDTPA_PROXY_DELAY]);
fprintf(fp, "proxy_delay %llu ", proxy_delay);
}
if (tpb[NDTPA_PROXY_QLEN]) {
__u32 pqueue = *(__u32 *)RTA_DATA(tpb[NDTPA_PROXY_QLEN]);
fprintf(fp, "proxy_queue %u ", pqueue);
}
if (tpb[NDTPA_LOCKTIME]) {
__u64 locktime = *(__u64 *)RTA_DATA(tpb[NDTPA_LOCKTIME]);
fprintf(fp, "locktime %llu ", locktime);
}
fprintf(fp, "%s", _SL_);
}
if (tb[NDTA_STATS] && show_stats) {
struct ndt_stats *ndts = RTA_DATA(tb[NDTA_STATS]);
fprintf(fp, " ");
fprintf(fp, "stats ");
fprintf(fp, "allocs %llu ", ndts->ndts_allocs);
fprintf(fp, "destroys %llu ", ndts->ndts_destroys);
fprintf(fp, "hash_grows %llu ", ndts->ndts_hash_grows);
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
fprintf(fp, "res_failed %llu ", ndts->ndts_res_failed);
fprintf(fp, "lookups %llu ", ndts->ndts_lookups);
fprintf(fp, "hits %llu ", ndts->ndts_hits);
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
fprintf(fp, "rcv_probes_mcast %llu ", ndts->ndts_rcv_probes_mcast);
fprintf(fp, "rcv_probes_ucast %llu ", ndts->ndts_rcv_probes_ucast);
fprintf(fp, "%s", _SL_);
fprintf(fp, " ");
fprintf(fp, "periodic_gc_runs %llu ", ndts->ndts_periodic_gc_runs);
fprintf(fp, "forced_gc_runs %llu ", ndts->ndts_forced_gc_runs);
fprintf(fp, "%s", _SL_);
}
fprintf(fp, "\n");
fflush(fp);
return 0;
}
void ipntable_reset_filter(void)
{
memset(&filter, 0, sizeof(filter));
}
static int ipntable_show(int argc, char **argv)
{
ipntable_reset_filter();
filter.family = preferred_family;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if (strcmp("none", *argv) == 0)
filter.index = NONE_DEV;
else if ((filter.index = ll_name_to_index(*argv)) == 0)
invarg("\"DEV\" is invalid", *argv);
} else if (strcmp(*argv, "name") == 0) {
NEXT_ARG();
strncpy(filter.name, *argv, sizeof(filter.name));
} else
invarg("unknown", *argv);
argc--; argv++;
}
if (rtnl_wilddump_request(&rth, preferred_family, RTM_GETNEIGHTBL) < 0) {
perror("Cannot send dump request");
exit(1);
}
if (rtnl_dump_filter(&rth, print_ntable, stdout, NULL, NULL) < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
return 0;
}
int do_ipntable(int argc, char **argv)
{
ll_init_map(&rth);
if (argc > 0) {
if (matches(*argv, "change") == 0 ||
matches(*argv, "chg") == 0)
return ipntable_modify(RTM_SETNEIGHTBL,
NLM_F_REPLACE,
argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)
return ipntable_show(argc-1, argv+1);
if (matches(*argv, "help") == 0)
usage();
} else
return ipntable_show(0, NULL);
fprintf(stderr, "Command \"%s\" is unknown, try \"ip ntable help\".\n", *argv);
exit(-1);
}