515 lines
25 KiB
TeX
515 lines
25 KiB
TeX
\documentclass[12pt,twoside]{article}
|
|
|
|
\usepackage[hidelinks]{hyperref} % \url
|
|
\usepackage{booktabs} % nicer tabulars
|
|
\usepackage{fancyvrb}
|
|
\usepackage{fullpage}
|
|
\usepackage{float}
|
|
|
|
\newcommand{\iface}{\textit}
|
|
\newcommand{\cmd}{\texttt}
|
|
\newcommand{\man}{\textit}
|
|
\newcommand{\qdisc}{\texttt}
|
|
\newcommand{\filter}{\texttt}
|
|
|
|
\begin{document}
|
|
\title{QoS in Linux with TC and Filters}
|
|
\author{Phil Sutter (phil@nwl.cc)}
|
|
\date{January 2016}
|
|
\maketitle
|
|
|
|
Standard practice when transmitting packets over a medium which may block (due
|
|
to congestion, e.g.) is to use a queue which temporarily holds these packets. In
|
|
Linux, this queueing approach is where QoS happens: A Queueing Discipline
|
|
(qdisc) holds multiple packet queues with different priorities for dequeueing to
|
|
the network driver. The classification (i.e. deciding which queue a packet
|
|
should go into) is typically done based on Type Of Service (IPv4) or Traffic
|
|
Class (IPv6) header fields but depending on qdisc implementation, might be
|
|
controlled by the user as well.
|
|
|
|
Qdiscs come in two flavors, classful or classless. While classless qdiscs are
|
|
not as flexible as classful ones, they also require much less customizing. Often
|
|
it is enough to just attach them to an interface, without exact knowledge of
|
|
what is done internally. Classful qdiscs are the exact opposite: flexible in
|
|
application, they are often not even usable without insightful configuration.
|
|
|
|
As the name implies, classful qdiscs provide configurable classes to sort
|
|
traffic into. In it's basic form, this is not much different than, say, the
|
|
classless \qdisc{pfifo\_fast} which holds three queues and classifies per
|
|
packet upon priority field. Though typically classes go beyond that by
|
|
supporting nesting and additional characteristics like e.g. maximum traffic
|
|
rate or quantum.
|
|
|
|
When it comes to controlling the classification process, filters come into play.
|
|
They attach to the parent of a set of classes (i.e. either the qdisc itself or
|
|
a parent class) and specify how a packet (or it's associated flow) has to look
|
|
like in order to suit a given class. To overcome this simplification, it is
|
|
possible to attach multiple filters to the same parent, which then consults each
|
|
of them in row until the first one accepts the packet.
|
|
|
|
Before getting into detail about what filters there are and how to use them, a
|
|
simple setup of a qdisc with classes is necessary:
|
|
\begin{figure}[H]
|
|
\begin{Verbatim}
|
|
.-------------------------------------------------------.
|
|
| |
|
|
| HTB |
|
|
| |
|
|
| .----------------------------------------------------.|
|
|
| | ||
|
|
| | Class 1:1 ||
|
|
| | ||
|
|
| | .---------------..---------------..---------------.||
|
|
| | | || || |||
|
|
| | | Class 1:10 || Class 1:20 || Class 1:30 |||
|
|
| | | || || |||
|
|
| | | .------------.|| .------------.|| .------------.|||
|
|
| | | | ||| | ||| | ||||
|
|
| | | | fq_codel ||| | fq_codel ||| | fq_codel ||||
|
|
| | | | ||| | ||| | ||||
|
|
| | | '------------'|| '------------'|| '------------'|||
|
|
| | '---------------''---------------''---------------'||
|
|
| '----------------------------------------------------'|
|
|
'-------------------------------------------------------'
|
|
\end{Verbatim}
|
|
\end{figure}
|
|
\noindent
|
|
The following commands establish the basic setup shown:
|
|
\begin{Verbatim}
|
|
(1) # tc qdisc replace dev eth0 root handle 1: htb default 30
|
|
(2) # tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit
|
|
(3) # alias tclass='tc class add dev eth0 parent 1:1'
|
|
(4) # tclass classid 1:10 htb rate 1mbit ceil 20mbit prio 1
|
|
(4) # tclass classid 1:20 htb rate 90mbit ceil 95mbit prio 2
|
|
(4) # tclass classid 1:30 htb rate 1mbit ceil 95mbit prio 3
|
|
(5) # tc qdisc add dev eth0 parent 1:10 fq_codel
|
|
(5) # tc qdisc add dev eth0 parent 1:20 fq_codel
|
|
(5) # tc qdisc add dev eth0 parent 1:30 fq_codel
|
|
\end{Verbatim}
|
|
A little explanation for the unfamiliar reader:
|
|
\begin{enumerate}
|
|
\item Replace the root qdisc of \iface{eth0} by an instance of \qdisc{HTB}.
|
|
Specifying the handle is necessary so it can be referenced in consecutive
|
|
calls to \cmd{tc}. The default class for unclassified traffic is set to
|
|
30.
|
|
\item Create a single top-level class with handle 1:1 which limits the total
|
|
bandwidth allowed to 95mbit/s. It is assumed that \iface{eth0} is a 100mbit/s link,
|
|
staying a little below that helps to keep the main point of enqueueing in
|
|
the qdisc layer instead of the interface hardware queue or at another
|
|
bottleneck in the network.
|
|
\item Define an alias for the common part of the remaining three calls in order
|
|
to improve readability. This means all remaining classes are attached to the
|
|
common parent class from (2).
|
|
\item Create three child classes for different uses: Class 1:10 has highest
|
|
priority but is tightly limited in bandwidth - fine for interactive
|
|
connections. Class 1:20 has mid priority and high guaranteed bandwidth, for
|
|
high priority bulk traffic. Finally, there's the default class 1:30 with
|
|
lowest priority, low guaranteed bandwidth and the ability to use the full
|
|
link in case it's unused otherwise. This should be fine for uninteresting
|
|
traffic not explicitly taken care of.
|
|
\item Attach a leaf qdisc to each of the child classes created in (4). Since
|
|
\qdisc{HTB} by default attaches \qdisc{pfifo} as leaf qdisc, this step is optional. Still,
|
|
the fairness between different flows provided by the classless \qdisc{fq\_codel} is
|
|
worth the effort.
|
|
\end{enumerate}
|
|
More information about the qdiscs and fine-tuning parameters can be found in
|
|
\man{tc-htb(8)} and \man{tc-fq\_codel(8)}.
|
|
|
|
Without any additional setup done, now all traffic leaving \iface{eth0} is shaped to
|
|
95mbit/s and directed through class 1:30. This can be verified by looking at the
|
|
\texttt{Sent} field of the class statistics printed via \cmd{tc -s class show dev eth0}:
|
|
Only the root class 1:1 and it's child 1:30 should show any traffic.
|
|
|
|
|
|
\section*{Finally time to start filtering!}
|
|
|
|
Let's begin with a simple one, i.e. reestablishing what \qdisc{pfifo\_fast} did
|
|
automatically based on TOS/Priority field. Linux internally translates the
|
|
header field into the priority field of struct skbuff, which
|
|
\qdisc{pfifo\_fast} uses for
|
|
classification. \man{tc-prio(8)} contains a table listing the priority (and
|
|
ultimately, \qdisc{pfifo\_fast} queue index) each TOS value is being translated into.
|
|
Here is a shorter version:
|
|
\begin{center}
|
|
\begin{tabular}{lll}
|
|
TOS Values & Linux Priority (Number) & Queue Index \\
|
|
\midrule
|
|
0x0 - 0x6 & Best Effort (0) & 1 \\
|
|
0x8 - 0xe & Bulk (2) & 2 \\
|
|
0x10 - 0x16 & Interactive (6) & 0 \\
|
|
0x18 - 0x1e & Interactive Bulk (4) & 1 \\
|
|
\end{tabular}
|
|
\end{center}
|
|
Using the \filter{basic} filter, it is possible to match packets based on that skbuff
|
|
field, which has the added benefit of being IP version agnostic. Since the
|
|
\qdisc{HTB} setup above defaults to class ID 1:30, the Bulk priority can be
|
|
ignored. The \filter{basic} filter allows to combine matches, therefore we get along
|
|
with only two filters:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: basic \
|
|
match 'meta(priority eq 6)' classid 1:10
|
|
# tc filter add dev eth0 parent 1: basic \
|
|
match 'meta(priority eq 0)' \
|
|
or 'meta(priority eq 4)' classid 1:20
|
|
\end{Verbatim}
|
|
A detailed description of the \filter{basic} filter and the ematch syntax it uses can be
|
|
found in \man{tc-basic(8)} and \man{tc-ematch(8)}.
|
|
|
|
Obviously, this first example cries for optimization. A simple one would be to
|
|
just change the default class from 1:30 to 1:20, so filters are only needed for
|
|
Bulk and Interactive priorities:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: basic \
|
|
match 'meta(priority eq 6)' classid 1:10
|
|
# tc filter add dev eth0 parent 1: basic \
|
|
match 'meta(priority eq 2)' classid 1:20
|
|
\end{Verbatim}
|
|
Given that class IDs are random, choosing them wisely allows for a direct
|
|
mapping. So first, recreate the qdisc and classes configuration:
|
|
\begin{Verbatim}
|
|
# tc qdisc replace dev eth0 root handle 1: htb default 10
|
|
# tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit
|
|
# alias tclass='tc class add dev eth0 parent 1:1'
|
|
# tclass classid 1:16 htb rate 1mbit ceil 20mbit prio 1
|
|
# tclass classid 1:10 htb rate 90mbit ceil 95mbit prio 2
|
|
# tclass classid 1:12 htb rate 1mbit ceil 95mbit prio 3
|
|
# tc qdisc add dev eth0 parent 1:16 fq_codel
|
|
# tc qdisc add dev eth0 parent 1:10 fq_codel
|
|
# tc qdisc add dev eth0 parent 1:12 fq_codel
|
|
\end{Verbatim}
|
|
This is basically identical to above, but with changed leaf class IDs and the
|
|
second priority class being the default. Using the \filter{flow} filter with it's \texttt{map}
|
|
functionality, a single filter command is enough:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: handle 0x1337 flow \
|
|
map key priority baseclass 1:10
|
|
\end{Verbatim}
|
|
The \filter{flow} filter now uses the priority value to construct a destination class ID
|
|
by adding it to the value of \texttt{baseclass}. While this works for priority values of
|
|
0, 2 and 6, it will result in non-existent class ID 1:14 for Interactive Bulk
|
|
traffic. In that case, the \qdisc{HTB} default applies so that traffic goes into class
|
|
ID 1:10 just as intended. Please note that specifying a handle is a mandatory
|
|
requirement by the \filter{flow} filter, although I didn't see where one would use that
|
|
later. For more information about \filter{flow}, see \man{tc-flow(8)}.
|
|
|
|
While \filter{flow} and \filter{basic} filters are relatively easy to apply and understand, they
|
|
are as well quite limited to their intended purpose. A more flexible option is
|
|
the \filter{u32} filter, which allows to match on arbitrary parts of the packet data -
|
|
yet only on that, not any meta data associated to it by the kernel (with the
|
|
exception of firewall mark value). So in order to continue this little
|
|
exercise with \filter{u32}, we have to base classification directly upon the actual TOS
|
|
value. An intuitive attempt might look like this:
|
|
\begin{Verbatim}
|
|
# alias tcfilter='tc filter add dev eth0 parent 1:'
|
|
# tcfilter u32 match ip dsfield 0x10 0x1e classid 1:16
|
|
# tcfilter u32 match ip dsfield 0x12 0x1e classid 1:16
|
|
# tcfilter u32 match ip dsfield 0x14 0x1e classid 1:16
|
|
# tcfilter u32 match ip dsfield 0x16 0x1e classid 1:16
|
|
# tcfilter u32 match ip dsfield 0x8 0x1e classid 1:12
|
|
# tcfilter u32 match ip dsfield 0xa 0x1e classid 1:12
|
|
# tcfilter u32 match ip dsfield 0xc 0x1e classid 1:12
|
|
# tcfilter u32 match ip dsfield 0xe 0x1e classid 1:12
|
|
\end{Verbatim}
|
|
The obvious drawback here is the amount of filters needed. And without the
|
|
default class, eight more filters would be necessary. This also has performance
|
|
implications: A packet with TOS value 0xe will be checked eight times in total
|
|
in order to determine it's destination class. While there's not much to be done
|
|
about the number of filters, at least the performance problem can be eliminated
|
|
by using \filter{u32}'s hash table support:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: prio 99 handle 1: u32 divisor 16
|
|
\end{Verbatim}
|
|
This creates a hash table with 16 buckets. The table size is arbitrary, but not
|
|
random: Since the first bit of the TOS field is not interesting, it can be
|
|
ignored and therefore the range of values to consider is just [0;15], i.e. a
|
|
number of 16 different values. The next step is to populate the hash table:
|
|
\begin{Verbatim}
|
|
# alias tcfilter='tc filter add dev eth0 parent 1: prio 99'
|
|
# tcfilter u32 match u8 0 0 ht 1:0: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:1: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:2: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:3: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:4: classid 1:12
|
|
# tcfilter u32 match u8 0 0 ht 1:5: classid 1:12
|
|
# tcfilter u32 match u8 0 0 ht 1:6: classid 1:12
|
|
# tcfilter u32 match u8 0 0 ht 1:7: classid 1:12
|
|
# tcfilter u32 match u8 0 0 ht 1:8: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:9: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:a: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:b: classid 1:16
|
|
# tcfilter u32 match u8 0 0 ht 1:c: classid 1:10
|
|
# tcfilter u32 match u8 0 0 ht 1:d: classid 1:10
|
|
# tcfilter u32 match u8 0 0 ht 1:e: classid 1:10
|
|
# tcfilter u32 match u8 0 0 ht 1:f: classid 1:10
|
|
\end{Verbatim}
|
|
The parameter \texttt{ht} denotes the hash table and bucket the filter should be added
|
|
to. Since the first TOS bit is ignored, it's value has to be divided by two in
|
|
order to get to the bucket it maps to. E.g. a TOS value of 0x10 will therefore
|
|
map to bucket 0x8. For the sake of completeness, all possible values are mapped
|
|
and therefore a configurable default class is not required. Note that the used
|
|
match expression is not necessary, but mandatory. Therefore anything that
|
|
matches any packet will suffice. Finally, a filter which links to the defined
|
|
hash table is needed:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: prio 1 protocol ip u32 \
|
|
link 1: hashkey mask 0x001e0000 match u8 0 0
|
|
\end{Verbatim}
|
|
Here again, the actual match statement is not necessary, but syntactically
|
|
required. All the magic lies within the \texttt{hashkey} parameter, which defines which
|
|
part of the packet should be used directly as hash key. Here's a drawing of the
|
|
first four bytes of the IPv4 header, with the area selected by \texttt{hashkey mask}
|
|
highlighted:
|
|
\begin{figure}[H]
|
|
\begin{Verbatim}
|
|
0 1 2 3
|
|
.-----------------------------------------------------------------.
|
|
| | | ######## | | |
|
|
| Version| IHL | #DSCP### | ECN| Total Length |
|
|
| | | ######## | | |
|
|
'-----------------------------------------------------------------'
|
|
\end{Verbatim}
|
|
\end{figure}
|
|
\noindent
|
|
This may look confusing at first, but keep in mind that bit- as well as
|
|
byte-ordering here is LSB while the mask value is written in MSB we humans use.
|
|
Therefore reading the mask is done like so, starting from left:
|
|
\begin{enumerate}
|
|
\item Skip the first byte (which contains Version and IHL fields).
|
|
\item Skip the lowest bit of the second byte (0x1e is even).
|
|
\item Mark the four following bits (0x1e is 11110 in binary).
|
|
\item Skip the remaining three bits of the second byte as well as the remaining two
|
|
bytes.
|
|
\end{enumerate}
|
|
Before doing the lookup, the kernel right-shifts the masked value by the amount
|
|
of zero-bits in \texttt{mask}, which implicitly also does the division by two which the
|
|
hash table depends on. With this setup, every packet has to pass exactly two
|
|
filters to be classified. Note that this filter is limited to IPv4 packets: Due
|
|
to the related Traffic Class field being at a different offset in the packet, it
|
|
would not work for IPv6. To use the same setup for IPv6 as well, a second
|
|
entry-level filter is necessary:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent 1: prio 2 protocol ipv6 u32 \
|
|
link 1: hashkey mask 0x01e00000 match u8 0 0
|
|
\end{Verbatim}
|
|
For illustration purposes, here again is a drawing of the first four bytes of
|
|
the IPv6 header, again with masked area highlighted:
|
|
\begin{figure}[H]
|
|
\begin{Verbatim}
|
|
0 1 2 3
|
|
.-----------------------------------------------------------------.
|
|
| | ######## | |
|
|
| Version| #Traffic Class| Flow Label |
|
|
| | ######## | |
|
|
'-----------------------------------------------------------------'
|
|
\end{Verbatim}
|
|
\end{figure}
|
|
\noindent
|
|
Reading the mask value is analogous to IPv4 with the added complexity that
|
|
Traffic Class spans over two bytes. Yet, for comparison there's a simple trick:
|
|
IPv6 has the interesting field shifted by four bits to the left, and the new
|
|
mask's value is shifted by the same amount. For further information about
|
|
\filter{u32} and what can be done with it, consult it's man page
|
|
\man{tc-u32(8)}.
|
|
|
|
Of course, the kernel provides many more filters than just \filter{basic},
|
|
\filter{flow} and \filter{u32} which have been presented above. As of now, the
|
|
remaining ones are:
|
|
\begin{description}
|
|
\item[bpf]
|
|
Filtering using Berkeley Packet Filter programs. The program's return
|
|
code determines the packet's destination class ID.
|
|
|
|
\item[cgroup]
|
|
Filter packets based on control groups. This is only useful for packets
|
|
originating from the local host, as control groups only exist in that
|
|
scope.
|
|
|
|
\item[flower]
|
|
An extended variant of the flow filter.
|
|
|
|
\item[fw]
|
|
Matches on firewall mark values previously assigned to the packet by
|
|
netfilter (or a filter action, see below for details). This allows to
|
|
export the classification algorithm into netfilter, which is very
|
|
convenient if appropriate rules exist on the same system in there
|
|
already.
|
|
|
|
\item[route]
|
|
Filter packets based on matching routing table entry. Basically
|
|
equivalent to the \texttt{fw} filter above, to make use of an already existing
|
|
extensive routing table setup.
|
|
|
|
\item[rsvp, rsvp6]
|
|
Implementation of the Resource Reservation Protocol in Linux, to react
|
|
upon requests sent by an RSVP daemon.
|
|
|
|
\item[tcindex]
|
|
Match packets based on tcindex value, which is usually set by the dsmark
|
|
qdisc. This is part of an approach to support Differentiated Services in
|
|
Linux, which is another topic on it's own.
|
|
\end{description}
|
|
|
|
|
|
\section*{Filter Actions}
|
|
|
|
The tc filter framework provides the infrastructure to another extensible set of
|
|
tools as well, namely tc actions. As the name suggests, they allow to do things
|
|
with packets (or associated data). (The list of) Actions are part of a given
|
|
filter. If it matches, each action it contains is executed in order before
|
|
returning the classification result. Since the action has direct access to the
|
|
latter, it is in theory possible for an action to react upon or even change the
|
|
filtering result - as long as the packet matched, of course. Yet none of the
|
|
currently in-tree actions make use of this.
|
|
|
|
The Generic Actions framework originally evolved out of the filters' ability to
|
|
police traffic to a given maximum bandwidth. One common use case for that is to
|
|
limit ingress traffic, dropping packets which exceed the threshold. A classic
|
|
setup example is like so:
|
|
\begin{Verbatim}
|
|
# tc qdisc add dev eth0 handle ffff: ingress
|
|
# tc filter add dev eth0 parent ffff: u32 \
|
|
match u32 0 0
|
|
police rate 1mbit burst 100k
|
|
\end{Verbatim}
|
|
The ingress qdisc is not a real one, but merely a point of reference for filters
|
|
to attach to which should get applied to incoming traffic. The \filter{u32} filter added
|
|
above matches on any packet and therefore limits the total incoming bandwidth to
|
|
1mbit/s, allowing bursts of up to 100kbytes. Using the new syntax, the filter
|
|
command changes slightly:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent ffff: u32 \
|
|
match u32 0 0 \
|
|
action police rate 1mbit burst 100k
|
|
\end{Verbatim}
|
|
The important detail is that this syntax allows to define multiple actions.
|
|
E.g. for testing purposes, it is possible to redirect exceeding traffic to the
|
|
loopback interface instead of dropping it:
|
|
\begin{Verbatim}
|
|
# tc filter add dev eth0 parent ffff: u32 \
|
|
match u32 0 0 \
|
|
action police rate 1mbit burst 100k conform-exceed pipe \
|
|
action mirred egress redirect dev lo
|
|
\end{Verbatim}
|
|
The added parameter \texttt{conform-exceed pipe} tells the police action to allow for
|
|
further actions to handle the exceeding packet.
|
|
|
|
Apart from \texttt{police} and \texttt{mirred} actions, there are a few more. Here's a full
|
|
list of the currently implemented ones:
|
|
\begin{description}
|
|
\item[bpf]
|
|
Apply a Berkeley Packet Filter program to the packet.
|
|
|
|
\item[connmark]
|
|
Set the packet's firewall mark to that of it's connection. This works by
|
|
searching the conntrack table for a matching entry. If found, the mark
|
|
is restored.
|
|
|
|
\item[csum]
|
|
Trigger recalculation of packet checksums. The supported protocols are:
|
|
IPv4, ICMP, IGMP, TCP, UDP and UDPLite.
|
|
|
|
\item[ipt]
|
|
Pass the packet to an iptables target. This allows to use iptables
|
|
extensions directly instead of having to go the extra mile via setting
|
|
an arbitrary firewall mark and matching on that from within netfilter.
|
|
|
|
\item[mirred]
|
|
Mirror or redirect packets. This is often combined with the ifb pseudo
|
|
device to share a common QoS setup between multiple interfaces or even
|
|
ingress traffic.
|
|
|
|
\item[nat]
|
|
Perform stateless Native Address Translation. This is certainly not
|
|
complete and therefore inferior to NAT using iptables: Although the
|
|
kernel module decides between TCP, UDP and ICMP traffic, it does not
|
|
handle typical problematic protocols such as active FTP or SIP.
|
|
|
|
\item[pedit]
|
|
Generic packet editing. This allows to alter arbitrary bytes of the
|
|
packet, either by specifying an offset into the packet or by naming a
|
|
packet header and field name to change. Currently, the latter is
|
|
implemented only for IPv4 yet.
|
|
|
|
\item[police]
|
|
Apply a bandwidth rate limiting policy. Packets exceeding it are dropped
|
|
by default, but may optionally be handled differently.
|
|
|
|
\item[simple]
|
|
This is rather an example than real action. All it does is print a
|
|
user-defined string together with a packet counter. Useful maybe for
|
|
debugging when filter statistics are not available or too complicated.
|
|
|
|
\item[skbedit]
|
|
Edit associated packet data, supports changing queue mapping, priority
|
|
field and firewall mark value.
|
|
|
|
\item[vlan]
|
|
Add/remove a VLAN header to/from the packet. This might serve as
|
|
alternative to using 802.1Q pseudo-interfaces in combination with
|
|
routing rules when e.g. packets for a given destination need to be
|
|
encapsulated.
|
|
\end{description}
|
|
|
|
|
|
\section*{Intermediate Functional Block}
|
|
|
|
The Intermediate Functional Block (\texttt{ifb}) pseudo network interface acts as a QoS
|
|
concentrator for multiple different sources of traffic. Packets from or to other
|
|
interfaces have to be redirected to it using the \texttt{mirred} action in order to be
|
|
handled, regularly routed traffic will be dropped. This way, a single stack of
|
|
qdiscs, classes and filters can be shared between multiple interfaces.
|
|
|
|
Here's a simple example to feed incoming traffic from multiple interfaces
|
|
through a Stochastic Fairness Queue (\qdisc{sfq}):
|
|
\begin{Verbatim}
|
|
(1) # modprobe ifb
|
|
(2) # ip link set ifb0 up
|
|
(3) # tc qdisc add dev ifb0 root sfq
|
|
\end{Verbatim}
|
|
The first step is to load the \texttt{ifb} kernel module (1). By default, this will
|
|
create two ifb devices: \iface{ifb0} and \iface{ifb1}. After setting
|
|
\iface{ifb0} up in (2), the root
|
|
qdisc is replaced by \qdisc{sfq} in (3). Finally, one can start redirecting ingress
|
|
traffic to \iface{ifb0}, e.g. from \iface{eth0}:
|
|
\begin{Verbatim}
|
|
# tc qdisc add dev eth0 handle ffff: ingress
|
|
# tc filter add dev eth0 parent ffff: u32 \
|
|
match u32 0 0 \
|
|
action mirred egress redirect dev ifb0
|
|
\end{Verbatim}
|
|
The same can be done for other interfaces, just replacing \iface{eth0} in the two
|
|
commands above. One thing to keep in mind here is the asymmetrical routing this
|
|
creates within the host doing the QoS: Incoming packets enter the system via
|
|
\iface{ifb0}, while corresponding replies leave directly via \iface{eth0}. This can be observed
|
|
using \cmd{tcpdump} on \iface{ifb0}, which shows the input part of the traffic only. What's
|
|
more confusing is that \cmd{tcpdump} on \iface{eth0} shows both incoming and outgoing traffic,
|
|
but the redirection is still effective - a simple prove is setting
|
|
\iface{ifb0} down,
|
|
which will interrupt the communication. Obviously \cmd{tcpdump} catches the packets to
|
|
dump before they enter the ingress qdisc, which is why it sees them while the
|
|
kernel itself doesn't.
|
|
|
|
|
|
\section*{Conclusion}
|
|
|
|
Once the steep learning curve has been mastered, the conglomerate of (classful)
|
|
qdiscs, filters and actions provides a highly sophisticated and flexible
|
|
infrastructure to perform QoS, which plays nicely along with routing and
|
|
firewalling setups.
|
|
|
|
|
|
\section*{Further Reading}
|
|
|
|
A good starting point for novice users and experienced ones diving into unknown
|
|
areas is the extensive HOWTO at \url{http://lartc.org}. The iproute2 package ships
|
|
some examples (usually in /usr/share/doc/, depending on distribution) as well as
|
|
man pages for \cmd{tc} in general, qdiscs and filters. The latter have been added
|
|
just recently though, so if your distribution does not ship iproute2 version
|
|
4.3.0 yet, these are not in there. Apart from that, the internet is a spring of
|
|
HOWTOs and scripts people wrote - though these should be taken with a grain of
|
|
salt: The complexity of the matter often leads to copying others' solutions
|
|
without much validation, which allows for less optimal or even obsolete
|
|
implementations to survive much longer than desired.
|
|
|
|
\end{document}
|