(Logical change 1.13)
This commit is contained in:
parent
c90e297870
commit
985794ad38
1809
man/man8/ip.8
1809
man/man8/ip.8
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,72 @@
|
|||
.TH PBFIFO 8 "10 January 2002" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
pfifo \- Packet limited First In, First Out queue
|
||||
.P
|
||||
bfifo \- Byte limited First In, First Out queue
|
||||
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... add pfifo
|
||||
.B [ limit
|
||||
packets
|
||||
.B ]
|
||||
.P
|
||||
.B tc qdisc ... add bfifo
|
||||
.B [ limit
|
||||
bytes
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
The pfifo and bfifo qdiscs are unadorned First In, First Out queues. They are the
|
||||
simplest queues possible and therefore have no overhead.
|
||||
.B pfifo
|
||||
constrains the queue size as measured in packets.
|
||||
.B bfifo
|
||||
does so as measured in bytes.
|
||||
|
||||
Like all non-default qdiscs, they maintain statistics. This might be a reason to prefer
|
||||
pfifo or bfifo over the default.
|
||||
|
||||
.SH ALGORITHM
|
||||
A list of packets is maintained, when a packet is enqueued it gets inserted at the tail of
|
||||
a list. When a packet needs to be sent out to the network, it is taken from the head of the list.
|
||||
|
||||
If the list is too long, no further packets are allowed on. This is called 'tail drop'.
|
||||
|
||||
.SH PARAMETERS
|
||||
.TP
|
||||
limit
|
||||
Maximum queue size. Specified in bytes for bfifo, in packets for pfifo. For pfifo, defaults
|
||||
to the interface txqueuelen, as specified with
|
||||
.BR ifconfig (8)
|
||||
or
|
||||
.BR ip (8).
|
||||
|
||||
For bfifo, it defaults to the txqueuelen multiplied by the interface MTU.
|
||||
|
||||
.SH OUTPUT
|
||||
The output of
|
||||
.B tc -s qdisc ls
|
||||
contains the limit, either in packets or in bytes, and the number of bytes
|
||||
and packets actually sent. An unsent and dropped packet only appears between braces
|
||||
and is not counted as 'Sent'.
|
||||
|
||||
In this example, the queue length is 100 packets, 45894 bytes were sent over 681 packets.
|
||||
No packets were dropped, and as the pfifo queue does not slow down packets, there were also no
|
||||
overlimits:
|
||||
.P
|
||||
.nf
|
||||
# tc -s qdisc ls dev eth0
|
||||
qdisc pfifo 8001: dev eth0 limit 100p
|
||||
Sent 45894 bytes 681 pkts (dropped 0, overlimits 0)
|
||||
.fi
|
||||
|
||||
If a backlog occurs, this is displayed as well.
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHORS
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>
|
||||
|
||||
This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,425 @@
|
|||
.TH CBQ 8 "8 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
CBQ \- Class Based Queueing
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... dev
|
||||
dev
|
||||
.B ( parent
|
||||
classid
|
||||
.B | root) [ handle
|
||||
major:
|
||||
.B ] cbq avpkt
|
||||
bytes
|
||||
.B bandwidth
|
||||
rate
|
||||
.B [ cell
|
||||
bytes
|
||||
.B ] [ ewma
|
||||
log
|
||||
.B ] [ mpu
|
||||
bytes
|
||||
.B ]
|
||||
|
||||
.B tc class ... dev
|
||||
dev
|
||||
.B parent
|
||||
major:[minor]
|
||||
.B [ classid
|
||||
major:minor
|
||||
.B ] cbq allot
|
||||
bytes
|
||||
.B [ bandwidth
|
||||
rate
|
||||
.B ] [ rate
|
||||
rate
|
||||
.B ] prio
|
||||
priority
|
||||
.B [ weight
|
||||
weight
|
||||
.B ] [ minburst
|
||||
packets
|
||||
.B ] [ maxburst
|
||||
packets
|
||||
.B ] [ ewma
|
||||
log
|
||||
.B ] [ cell
|
||||
bytes
|
||||
.B ] avpkt
|
||||
bytes
|
||||
.B [ mpu
|
||||
bytes
|
||||
.B ] [ bounded isolated ] [ split
|
||||
handle
|
||||
.B & defmap
|
||||
defmap
|
||||
.B ] [ estimator
|
||||
interval timeconstant
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
Class Based Queueing is a classful qdisc that implements a rich
|
||||
linksharing hierarchy of classes. It contains shaping elements as
|
||||
well as prioritizing capabilities. Shaping is performed using link
|
||||
idle time calculations based on the timing of dequeue events and
|
||||
underlying link bandwidth.
|
||||
|
||||
.SH SHAPING ALGORITHM
|
||||
Shaping is done using link idle time calculations, and actions taken if
|
||||
these calculations deviate from set limits.
|
||||
|
||||
When shaping a 10mbit/s connection to 1mbit/s, the link will
|
||||
be idle 90% of the time. If it isn't, it needs to be throttled so that it
|
||||
IS idle 90% of the time.
|
||||
|
||||
From the kernel's perspective, this is hard to measure, so CBQ instead
|
||||
derives the idle time from the number of microseconds (in fact, jiffies)
|
||||
that elapse between requests from the device driver for more data. Combined
|
||||
with the knowledge of packet sizes, this is used to approximate how full or
|
||||
empty the link is.
|
||||
|
||||
This is rather circumspect and doesn't always arrive at proper
|
||||
results. For example, what is the actual link speed of an interface
|
||||
that is not really able to transmit the full 100mbit/s of data,
|
||||
perhaps because of a badly implemented driver? A PCMCIA network card
|
||||
will also never achieve 100mbit/s because of the way the bus is
|
||||
designed - again, how do we calculate the idle time?
|
||||
|
||||
The physical link bandwidth may be ill defined in case of not-quite-real
|
||||
network devices like PPP over Ethernet or PPTP over TCP/IP. The effective
|
||||
bandwidth in that case is probably determined by the efficiency of pipes
|
||||
to userspace - which not defined.
|
||||
|
||||
During operations, the effective idletime is measured using an
|
||||
exponential weighted moving average (EWMA), which considers recent
|
||||
packets to be exponentially more important than past ones. The Unix
|
||||
loadaverage is calculated in the same way.
|
||||
|
||||
The calculated idle time is subtracted from the EWMA measured one,
|
||||
the resulting number is called 'avgidle'. A perfectly loaded link has
|
||||
an avgidle of zero: packets arrive exactly at the calculated
|
||||
interval.
|
||||
|
||||
An overloaded link has a negative avgidle and if it gets too negative,
|
||||
CBQ throttles and is then 'overlimit'.
|
||||
|
||||
Conversely, an idle link might amass a huge avgidle, which would then
|
||||
allow infinite bandwidths after a few hours of silence. To prevent
|
||||
this, avgidle is capped at
|
||||
.B maxidle.
|
||||
|
||||
If overlimit, in theory, the CBQ could throttle itself for exactly the
|
||||
amount of time that was calculated to pass between packets, and then
|
||||
pass one packet, and throttle again. Due to timer resolution constraints,
|
||||
this may not be feasible, see the
|
||||
.B minburst
|
||||
parameter below.
|
||||
|
||||
.SH CLASSIFICATION
|
||||
Within the one CBQ instance many classes may exist. Each of these classes
|
||||
contains another qdisc, by default
|
||||
.BR tc-pfifo (8).
|
||||
|
||||
When enqueueing a packet, CBQ starts at the root and uses various methods to
|
||||
determine which class should receive the data. If a verdict is reached, this
|
||||
process is repeated for the recipient class which might have further
|
||||
means of classifying traffic to its children, if any.
|
||||
|
||||
CBQ has the following methods available to classify a packet to any child
|
||||
classes.
|
||||
.TP
|
||||
(i)
|
||||
.B skb->priority class encoding.
|
||||
Can be set from userspace by an application with the
|
||||
.B SO_PRIORITY
|
||||
setsockopt.
|
||||
The
|
||||
.B skb->priority class encoding
|
||||
only applies if the skb->priority holds a major:minor handle of an existing
|
||||
class within this qdisc.
|
||||
.TP
|
||||
(ii)
|
||||
tc filters attached to the class.
|
||||
.TP
|
||||
(iii)
|
||||
The defmap of a class, as set with the
|
||||
.B split & defmap
|
||||
parameters. The defmap may contain instructions for each possible Linux packet
|
||||
priority.
|
||||
|
||||
.P
|
||||
Each class also has a
|
||||
.B level.
|
||||
Leaf nodes, attached to the bottom of the class hierarchy, have a level of 0.
|
||||
.SH CLASSIFICATION ALGORITHM
|
||||
|
||||
Classification is a loop, which terminates when a leaf class is found. At any
|
||||
point the loop may jump to the fallback algorithm.
|
||||
|
||||
The loop consists of the following steps:
|
||||
.TP
|
||||
(i)
|
||||
If the packet is generated locally and has a valid classid encoded within its
|
||||
.B skb->priority,
|
||||
choose it and terminate.
|
||||
|
||||
.TP
|
||||
(ii)
|
||||
Consult the tc filters, if any, attached to this child. If these return
|
||||
a class which is not a leaf class, restart loop from the class returned.
|
||||
If it is a leaf, choose it and terminate.
|
||||
.TP
|
||||
(iii)
|
||||
If the tc filters did not return a class, but did return a classid,
|
||||
try to find a class with that id within this qdisc.
|
||||
Check if the found class is of a lower
|
||||
.B level
|
||||
than the current class. If so, and the returned class is not a leaf node,
|
||||
restart the loop at the found class. If it is a leaf node, terminate.
|
||||
If we found an upward reference to a higher level, enter the fallback
|
||||
algorithm.
|
||||
.TP
|
||||
(iv)
|
||||
If the tc filters did not return a class, nor a valid reference to one,
|
||||
consider the minor number of the reference to be the priority. Retrieve
|
||||
a class from the defmap of this class for the priority. If this did not
|
||||
contain a class, consult the defmap of this class for the
|
||||
.B BEST_EFFORT
|
||||
class. If this is an upward reference, or no
|
||||
.B BEST_EFFORT
|
||||
class was defined,
|
||||
enter the fallback algorithm. If a valid class was found, and it is not a
|
||||
leaf node, restart the loop at this class. If it is a leaf, choose it and
|
||||
terminate. If
|
||||
neither the priority distilled from the classid, nor the
|
||||
.B BEST_EFFORT
|
||||
priority yielded a class, enter the fallback algorithm.
|
||||
.P
|
||||
The fallback algorithm resides outside of the loop and is as follows.
|
||||
.TP
|
||||
(i)
|
||||
Consult the defmap of the class at which the jump to fallback occured. If
|
||||
the defmap contains a class for the
|
||||
.B
|
||||
priority
|
||||
of the class (which is related to the TOS field), choose this class and
|
||||
terminate.
|
||||
.TP
|
||||
(ii)
|
||||
Consult the map for a class for the
|
||||
.B BEST_EFFORT
|
||||
priority. If found, choose it, and terminate.
|
||||
.TP
|
||||
(iii)
|
||||
Choose the class at which break out to the fallback algorithm occured. Terminate.
|
||||
.P
|
||||
The packet is enqueued to the class which was chosen when either algorithm
|
||||
terminated. It is therefore possible for a packet to be enqueued *not* at a
|
||||
leaf node, but in the middle of the hierarchy.
|
||||
|
||||
.SH LINK SHARING ALGORITHM
|
||||
When dequeuing for sending to the network device, CBQ decides which of its
|
||||
classes will be allowed to send. It does so with a Weighted Round Robin process
|
||||
in which each class with packets gets a chance to send in turn. The WRR process
|
||||
starts by asking the highest priority classes (lowest numerically -
|
||||
highest semantically) for packets, and will continue to do so until they
|
||||
have no more data to offer, in which case the process repeats for lower
|
||||
priorities.
|
||||
|
||||
.B CERTAINTY ENDS HERE, ANK PLEASE HELP
|
||||
|
||||
Each class is not allowed to send at length though - they can only dequeue a
|
||||
configurable amount of data during each round.
|
||||
|
||||
If a class is about to go overlimit, and it is not
|
||||
.B bounded
|
||||
it will try to borrow avgidle from siblings that are not
|
||||
.B isolated.
|
||||
This process is repeated from the bottom upwards. If a class is unable
|
||||
to borrow enough avgidle to send a packet, it is throttled and not asked
|
||||
for a packet for enough time for the avgidle to increase above zero.
|
||||
|
||||
.B I REALLY NEED HELP FIGURING THIS OUT. REST OF DOCUMENT IS PRETTY CERTAIN
|
||||
.B AGAIN.
|
||||
|
||||
.SH QDISC
|
||||
The root qdisc of a CBQ class tree has the following parameters:
|
||||
|
||||
.TP
|
||||
parent major:minor | root
|
||||
This mandatory parameter determines the place of the CBQ instance, either at the
|
||||
.B root
|
||||
of an interface or within an existing class.
|
||||
.TP
|
||||
handle major:
|
||||
Like all other qdiscs, the CBQ can be assigned a handle. Should consist only
|
||||
of a major number, followed by a colon. Optional.
|
||||
.TP
|
||||
avpkt bytes
|
||||
For calculations, the average packet size must be known. It is silently capped
|
||||
at a minimum of 2/3 of the interface MTU. Mandatory.
|
||||
.TP
|
||||
bandwidth rate
|
||||
To determine the idle time, CBQ must know the bandwidth of your underlying
|
||||
physical interface, or parent qdisc. This is a vital parameter, more about it
|
||||
later. Mandatory.
|
||||
.TP
|
||||
cell
|
||||
The cell size determines he granularity of packet transmission time calculations. Has a sensible default.
|
||||
.TP
|
||||
mpu
|
||||
A zero sized packet may still take time to transmit. This value is the lower
|
||||
cap for packet transmission time calculations - packets smaller than this value
|
||||
are still deemed to have this size. Defaults to zero.
|
||||
.TP
|
||||
ewma log
|
||||
When CBQ needs to measure the average idle time, it does so using an
|
||||
Exponentially Weighted Moving Average which smoothes out measurements into
|
||||
a moving average. The EWMA LOG determines how much smoothing occurs. Defaults
|
||||
to 5. Lower values imply greater sensitivity. Must be between 0 and 31.
|
||||
.P
|
||||
A CBQ qdisc does not shape out of its own accord. It only needs to know certain
|
||||
parameters about the underlying link. Actual shaping is done in classes.
|
||||
|
||||
.SH CLASSES
|
||||
Classes have a host of parameters to configure their operation.
|
||||
|
||||
.TP
|
||||
parent major:minor
|
||||
Place of this class within the hierarchy. If attached directly to a qdisc
|
||||
and not to another class, minor can be omitted. Mandatory.
|
||||
.TP
|
||||
classid major:minor
|
||||
Like qdiscs, classes can be named. The major number must be equal to the
|
||||
major number of the qdisc to which it belongs. Optional, but needed if this
|
||||
class is going to have children.
|
||||
.TP
|
||||
weight weight
|
||||
When dequeuing to the interface, classes are tried for traffic in a
|
||||
round-robin fashion. Classes with a higher configured qdisc will generally
|
||||
have more traffic to offer during each round, so it makes sense to allow
|
||||
it to dequeue more traffic. All weights under a class are normalized, so
|
||||
only the ratios matter. Defaults to the configured rate, unless the priority
|
||||
of this class is maximal, in which case it is set to 1.
|
||||
.TP
|
||||
allot bytes
|
||||
Allot specifies how many bytes a qdisc can dequeue
|
||||
during each round of the process. This parameter is weighted using the
|
||||
renormalized class weight described above.
|
||||
|
||||
.TP
|
||||
priority priority
|
||||
In the round-robin process, classes with the lowest priority field are tried
|
||||
for packets first. Mandatory.
|
||||
|
||||
.TP
|
||||
rate rate
|
||||
Maximum rate this class and all its children combined can send at. Mandatory.
|
||||
|
||||
.TP
|
||||
bandwidth rate
|
||||
This is different from the bandwidth specified when creating a CBQ disc. Only
|
||||
used to determine maxidle and offtime, which are only calculated when
|
||||
specifying maxburst or minburst. Mandatory if specifying maxburst or minburst.
|
||||
|
||||
.TP
|
||||
maxburst
|
||||
This number of packets is used to calculate maxidle so that when
|
||||
avgidle is at maxidle, this number of average packets can be burst
|
||||
before avgidle drops to 0. Set it higher to be more tolerant of
|
||||
bursts. You can't set maxidle directly, only via this parameter.
|
||||
|
||||
.TP
|
||||
minburst
|
||||
As mentioned before, CBQ needs to throttle in case of
|
||||
overlimit. The ideal solution is to do so for exactly the calculated
|
||||
idle time, and pass 1 packet. However, Unix kernels generally have a
|
||||
hard time scheduling events shorter than 10ms, so it is better to
|
||||
throttle for a longer period, and then pass minburst packets in one
|
||||
go, and then sleep minburst times longer.
|
||||
|
||||
The time to wait is called the offtime. Higher values of minburst lead
|
||||
to more accurate shaping in the long term, but to bigger bursts at
|
||||
millisecond timescales.
|
||||
|
||||
.TP
|
||||
minidle
|
||||
If avgidle is below 0, we are overlimits and need to wait until
|
||||
avgidle will be big enough to send one packet. To prevent a sudden
|
||||
burst from shutting down the link for a prolonged period of time,
|
||||
avgidle is reset to minidle if it gets too low.
|
||||
|
||||
Minidle is specified in negative microseconds, so 10 means that
|
||||
avgidle is capped at -10us.
|
||||
|
||||
.TP
|
||||
bounded
|
||||
Signifies that this class will not borrow bandwidth from its siblings.
|
||||
.TP
|
||||
isolated
|
||||
Means that this class will not borrow bandwidth to its siblings
|
||||
|
||||
.TP
|
||||
split major:minor & defmap bitmap[/bitmap]
|
||||
If consulting filters attached to a class did not give a verdict,
|
||||
CBQ can also classify based on the packet's priority. There are 16
|
||||
priorities available, numbered from 0 to 15.
|
||||
|
||||
The defmap specifies which priorities this class wants to receive,
|
||||
specified as a bitmap. The Least Significant Bit corresponds to priority
|
||||
zero. The
|
||||
.B split
|
||||
parameter tells CBQ at which class the decision must be made, which should
|
||||
be a (grand)parent of the class you are adding.
|
||||
|
||||
As an example, 'tc class add ... classid 10:1 cbq .. split 10:0 defmap c0'
|
||||
configures class 10:0 to send packets with priorities 6 and 7 to 10:1.
|
||||
|
||||
The complimentary configuration would then
|
||||
be: 'tc class add ... classid 10:2 cbq ... split 10:0 defmap 3f'
|
||||
Which would send all packets 0, 1, 2, 3, 4 and 5 to 10:1.
|
||||
.TP
|
||||
estimator interval timeconstant
|
||||
CBQ can measure how much bandwidth each class is using, which tc filters
|
||||
can use to classify packets with. In order to determine the bandwidth
|
||||
it uses a very simple estimator that measures once every
|
||||
.B interval
|
||||
microseconds how much traffic has passed. This again is a EWMA, for which
|
||||
the time constant can be specified, also in microseconds. The
|
||||
.B time constant
|
||||
corresponds to the sluggishness of the measurement or, conversely, to the
|
||||
sensitivity of the average to short bursts. Higher values mean less
|
||||
sensitivity.
|
||||
|
||||
|
||||
|
||||
.SH SOURCES
|
||||
.TP
|
||||
o
|
||||
Sally Floyd and Van Jacobson, "Link-sharing and Resource
|
||||
Management Models for Packet Networks",
|
||||
IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd, "Notes on CBQ and Guarantee Service", 1995
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd, "Notes on Class-Based Queueing: Setting
|
||||
Parameters", 1996
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd and Michael Speer, "Experimental Results
|
||||
for Class-Based Queueing", 1998, not published.
|
||||
|
||||
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHOR
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by
|
||||
bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,353 @@
|
|||
.TH CBQ 8 "16 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
CBQ \- Class Based Queueing
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... dev
|
||||
dev
|
||||
.B ( parent
|
||||
classid
|
||||
.B | root) [ handle
|
||||
major:
|
||||
.B ] cbq [ allot
|
||||
bytes
|
||||
.B ] avpkt
|
||||
bytes
|
||||
.B bandwidth
|
||||
rate
|
||||
.B [ cell
|
||||
bytes
|
||||
.B ] [ ewma
|
||||
log
|
||||
.B ] [ mpu
|
||||
bytes
|
||||
.B ]
|
||||
|
||||
.B tc class ... dev
|
||||
dev
|
||||
.B parent
|
||||
major:[minor]
|
||||
.B [ classid
|
||||
major:minor
|
||||
.B ] cbq allot
|
||||
bytes
|
||||
.B [ bandwidth
|
||||
rate
|
||||
.B ] [ rate
|
||||
rate
|
||||
.B ] prio
|
||||
priority
|
||||
.B [ weight
|
||||
weight
|
||||
.B ] [ minburst
|
||||
packets
|
||||
.B ] [ maxburst
|
||||
packets
|
||||
.B ] [ ewma
|
||||
log
|
||||
.B ] [ cell
|
||||
bytes
|
||||
.B ] avpkt
|
||||
bytes
|
||||
.B [ mpu
|
||||
bytes
|
||||
.B ] [ bounded isolated ] [ split
|
||||
handle
|
||||
.B & defmap
|
||||
defmap
|
||||
.B ] [ estimator
|
||||
interval timeconstant
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
Class Based Queueing is a classful qdisc that implements a rich
|
||||
linksharing hierarchy of classes. It contains shaping elements as
|
||||
well as prioritizing capabilities. Shaping is performed using link
|
||||
idle time calculations based on the timing of dequeue events and
|
||||
underlying link bandwidth.
|
||||
|
||||
.SH SHAPING ALGORITHM
|
||||
When shaping a 10mbit/s connection to 1mbit/s, the link will
|
||||
be idle 90% of the time. If it isn't, it needs to be throttled so that it
|
||||
IS idle 90% of the time.
|
||||
|
||||
During operations, the effective idletime is measured using an
|
||||
exponential weighted moving average (EWMA), which considers recent
|
||||
packets to be exponentially more important than past ones. The Unix
|
||||
loadaverage is calculated in the same way.
|
||||
|
||||
The calculated idle time is subtracted from the EWMA measured one,
|
||||
the resulting number is called 'avgidle'. A perfectly loaded link has
|
||||
an avgidle of zero: packets arrive exactly at the calculated
|
||||
interval.
|
||||
|
||||
An overloaded link has a negative avgidle and if it gets too negative,
|
||||
CBQ throttles and is then 'overlimit'.
|
||||
|
||||
Conversely, an idle link might amass a huge avgidle, which would then
|
||||
allow infinite bandwidths after a few hours of silence. To prevent
|
||||
this, avgidle is capped at
|
||||
.B maxidle.
|
||||
|
||||
If overlimit, in theory, the CBQ could throttle itself for exactly the
|
||||
amount of time that was calculated to pass between packets, and then
|
||||
pass one packet, and throttle again. Due to timer resolution constraints,
|
||||
this may not be feasible, see the
|
||||
.B minburst
|
||||
parameter below.
|
||||
|
||||
.SH CLASSIFICATION
|
||||
Within the one CBQ instance many classes may exist. Each of these classes
|
||||
contains another qdisc, by default
|
||||
.BR tc-pfifo (8).
|
||||
|
||||
When enqueueing a packet, CBQ starts at the root and uses various methods to
|
||||
determine which class should receive the data.
|
||||
|
||||
In the absence of uncommon configuration options, the process is rather easy.
|
||||
At each node we look for an instruction, and then go to the class the
|
||||
instruction refers us to. If the class found is a barren leaf-node (without
|
||||
children), we enqueue the packet there. If it is not yet a leaf node, we do
|
||||
the whole thing over again starting from that node.
|
||||
|
||||
The following actions are performed, in order at each node we visit, until one
|
||||
sends us to another node, or terminates the process.
|
||||
.TP
|
||||
(i)
|
||||
Consult filters attached to the class. If sent to a leafnode, we are done.
|
||||
Otherwise, restart.
|
||||
.TP
|
||||
(ii)
|
||||
Consult the defmap for the priority assigned to this packet, which depends
|
||||
on the TOS bits. Check if the referral is leafless, otherwise restart.
|
||||
.TP
|
||||
(iii)
|
||||
Ask the defmap for instructions for the 'best effort' priority. Check the
|
||||
answer for leafness, otherwise restart.
|
||||
.TP
|
||||
(iv)
|
||||
If none of the above returned with an instruction, enqueue at this node.
|
||||
.P
|
||||
This algorithm makes sure that a packet always ends up somewhere, even while
|
||||
you are busy building your configuration.
|
||||
|
||||
For more details, see
|
||||
.BR tc-cbq-details(8).
|
||||
|
||||
.SH LINK SHARING ALGORITHM
|
||||
When dequeuing for sending to the network device, CBQ decides which of its
|
||||
classes will be allowed to send. It does so with a Weighted Round Robin process
|
||||
in which each class with packets gets a chance to send in turn. The WRR process
|
||||
starts by asking the highest priority classes (lowest numerically -
|
||||
highest semantically) for packets, and will continue to do so until they
|
||||
have no more data to offer, in which case the process repeats for lower
|
||||
priorities.
|
||||
|
||||
Classes by default borrow bandwidth from their siblings. A class can be
|
||||
prevented from doing so by declaring it 'bounded'. A class can also indicate
|
||||
its unwillingness to lend out bandwidth by being 'isolated'.
|
||||
|
||||
.SH QDISC
|
||||
The root of a CBQ qdisc class tree has the following parameters:
|
||||
|
||||
.TP
|
||||
parent major:minor | root
|
||||
This mandatory parameter determines the place of the CBQ instance, either at the
|
||||
.B root
|
||||
of an interface or within an existing class.
|
||||
.TP
|
||||
handle major:
|
||||
Like all other qdiscs, the CBQ can be assigned a handle. Should consist only
|
||||
of a major number, followed by a colon. Optional, but very useful if classes
|
||||
will be generated within this qdisc.
|
||||
.TP
|
||||
allot bytes
|
||||
This allotment is the 'chunkiness' of link sharing and is used for determining packet
|
||||
transmission time tables. The qdisc allot differs slightly from the class allot discussed
|
||||
below. Optional. Defaults to a reasonable value, related to avpkt.
|
||||
.TP
|
||||
avpkt bytes
|
||||
The average size of a packet is needed for calculating maxidle, and is also used
|
||||
for making sure 'allot' has a safe value. Mandatory.
|
||||
.TP
|
||||
bandwidth rate
|
||||
To determine the idle time, CBQ must know the bandwidth of your underlying
|
||||
physical interface, or parent qdisc. This is a vital parameter, more about it
|
||||
later. Mandatory.
|
||||
.TP
|
||||
cell
|
||||
The cell size determines he granularity of packet transmission time calculations. Has a sensible default.
|
||||
.TP
|
||||
mpu
|
||||
A zero sized packet may still take time to transmit. This value is the lower
|
||||
cap for packet transmission time calculations - packets smaller than this value
|
||||
are still deemed to have this size. Defaults to zero.
|
||||
.TP
|
||||
ewma log
|
||||
When CBQ needs to measure the average idle time, it does so using an
|
||||
Exponentially Weighted Moving Average which smoothes out measurements into
|
||||
a moving average. The EWMA LOG determines how much smoothing occurs. Lower
|
||||
values imply greater sensitivity. Must be between 0 and 31. Defaults
|
||||
to 5.
|
||||
.P
|
||||
A CBQ qdisc does not shape out of its own accord. It only needs to know certain
|
||||
parameters about the underlying link. Actual shaping is done in classes.
|
||||
|
||||
.SH CLASSES
|
||||
Classes have a host of parameters to configure their operation.
|
||||
|
||||
.TP
|
||||
parent major:minor
|
||||
Place of this class within the hierarchy. If attached directly to a qdisc
|
||||
and not to another class, minor can be omitted. Mandatory.
|
||||
.TP
|
||||
classid major:minor
|
||||
Like qdiscs, classes can be named. The major number must be equal to the
|
||||
major number of the qdisc to which it belongs. Optional, but needed if this
|
||||
class is going to have children.
|
||||
.TP
|
||||
weight weight
|
||||
When dequeuing to the interface, classes are tried for traffic in a
|
||||
round-robin fashion. Classes with a higher configured qdisc will generally
|
||||
have more traffic to offer during each round, so it makes sense to allow
|
||||
it to dequeue more traffic. All weights under a class are normalized, so
|
||||
only the ratios matter. Defaults to the configured rate, unless the priority
|
||||
of this class is maximal, in which case it is set to 1.
|
||||
.TP
|
||||
allot bytes
|
||||
Allot specifies how many bytes a qdisc can dequeue
|
||||
during each round of the process. This parameter is weighted using the
|
||||
renormalized class weight described above. Silently capped at a minimum of
|
||||
3/2 avpkt. Mandatory.
|
||||
|
||||
.TP
|
||||
prio priority
|
||||
In the round-robin process, classes with the lowest priority field are tried
|
||||
for packets first. Mandatory.
|
||||
|
||||
.TP
|
||||
avpkt
|
||||
See the QDISC section.
|
||||
|
||||
.TP
|
||||
rate rate
|
||||
Maximum rate this class and all its children combined can send at. Mandatory.
|
||||
|
||||
.TP
|
||||
bandwidth rate
|
||||
This is different from the bandwidth specified when creating a CBQ disc! Only
|
||||
used to determine maxidle and offtime, which are only calculated when
|
||||
specifying maxburst or minburst. Mandatory if specifying maxburst or minburst.
|
||||
|
||||
.TP
|
||||
maxburst
|
||||
This number of packets is used to calculate maxidle so that when
|
||||
avgidle is at maxidle, this number of average packets can be burst
|
||||
before avgidle drops to 0. Set it higher to be more tolerant of
|
||||
bursts. You can't set maxidle directly, only via this parameter.
|
||||
|
||||
.TP
|
||||
minburst
|
||||
As mentioned before, CBQ needs to throttle in case of
|
||||
overlimit. The ideal solution is to do so for exactly the calculated
|
||||
idle time, and pass 1 packet. However, Unix kernels generally have a
|
||||
hard time scheduling events shorter than 10ms, so it is better to
|
||||
throttle for a longer period, and then pass minburst packets in one
|
||||
go, and then sleep minburst times longer.
|
||||
|
||||
The time to wait is called the offtime. Higher values of minburst lead
|
||||
to more accurate shaping in the long term, but to bigger bursts at
|
||||
millisecond timescales. Optional.
|
||||
|
||||
.TP
|
||||
minidle
|
||||
If avgidle is below 0, we are overlimits and need to wait until
|
||||
avgidle will be big enough to send one packet. To prevent a sudden
|
||||
burst from shutting down the link for a prolonged period of time,
|
||||
avgidle is reset to minidle if it gets too low.
|
||||
|
||||
Minidle is specified in negative microseconds, so 10 means that
|
||||
avgidle is capped at -10us. Optional.
|
||||
|
||||
.TP
|
||||
bounded
|
||||
Signifies that this class will not borrow bandwidth from its siblings.
|
||||
.TP
|
||||
isolated
|
||||
Means that this class will not borrow bandwidth to its siblings
|
||||
|
||||
.TP
|
||||
split major:minor & defmap bitmap[/bitmap]
|
||||
If consulting filters attached to a class did not give a verdict,
|
||||
CBQ can also classify based on the packet's priority. There are 16
|
||||
priorities available, numbered from 0 to 15.
|
||||
|
||||
The defmap specifies which priorities this class wants to receive,
|
||||
specified as a bitmap. The Least Significant Bit corresponds to priority
|
||||
zero. The
|
||||
.B split
|
||||
parameter tells CBQ at which class the decision must be made, which should
|
||||
be a (grand)parent of the class you are adding.
|
||||
|
||||
As an example, 'tc class add ... classid 10:1 cbq .. split 10:0 defmap c0'
|
||||
configures class 10:0 to send packets with priorities 6 and 7 to 10:1.
|
||||
|
||||
The complimentary configuration would then
|
||||
be: 'tc class add ... classid 10:2 cbq ... split 10:0 defmap 3f'
|
||||
Which would send all packets 0, 1, 2, 3, 4 and 5 to 10:1.
|
||||
.TP
|
||||
estimator interval timeconstant
|
||||
CBQ can measure how much bandwidth each class is using, which tc filters
|
||||
can use to classify packets with. In order to determine the bandwidth
|
||||
it uses a very simple estimator that measures once every
|
||||
.B interval
|
||||
microseconds how much traffic has passed. This again is a EWMA, for which
|
||||
the time constant can be specified, also in microseconds. The
|
||||
.B time constant
|
||||
corresponds to the sluggishness of the measurement or, conversely, to the
|
||||
sensitivity of the average to short bursts. Higher values mean less
|
||||
sensitivity.
|
||||
|
||||
.SH BUGS
|
||||
The actual bandwidth of the underlying link may not be known, for example
|
||||
in the case of PPoE or PPTP connections which in fact may send over a
|
||||
pipe, instead of over a physical device. CBQ is quite resilient to major
|
||||
errors in the configured bandwidth, probably a the cost of coarser shaping.
|
||||
|
||||
Default kernels rely on coarse timing information for making decisions. These
|
||||
may make shaping precise in the long term, but inaccurate on second long scales.
|
||||
|
||||
See
|
||||
.BR tc-cbq-details(8)
|
||||
for hints on how to improve this.
|
||||
|
||||
.SH SOURCES
|
||||
.TP
|
||||
o
|
||||
Sally Floyd and Van Jacobson, "Link-sharing and Resource
|
||||
Management Models for Packet Networks",
|
||||
IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd, "Notes on Class-Based Queueing: Setting
|
||||
Parameters", 1996
|
||||
|
||||
.TP
|
||||
o
|
||||
Sally Floyd and Michael Speer, "Experimental Results
|
||||
for Class-Based Queueing", 1998, not published.
|
||||
|
||||
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHOR
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by
|
||||
bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,150 @@
|
|||
.TH HTB 8 "10 January 2002" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
HTB \- Hierarchy Token Bucket
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... dev
|
||||
dev
|
||||
.B ( parent
|
||||
classid
|
||||
.B | root) [ handle
|
||||
major:
|
||||
.B ] htb [ default
|
||||
minor-id
|
||||
.B ]
|
||||
|
||||
.B tc class ... dev
|
||||
dev
|
||||
.B parent
|
||||
major:[minor]
|
||||
.B [ classid
|
||||
major:minor
|
||||
.B ] htb rate
|
||||
rate
|
||||
.B [ ceil
|
||||
rate
|
||||
.B ] burst
|
||||
bytes
|
||||
.B [ cburst
|
||||
bytes
|
||||
.B ] [ prio
|
||||
priority
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
HTB is meant as a more understandable and intuitive replacement for
|
||||
the CBQ qdisc in Linux. Both CBQ and HTB help you to control the use
|
||||
of the outbound bandwidth on a given link. Both allow you to use one
|
||||
physical link to simulate several slower links and to send different
|
||||
kinds of traffic on different simulated links. In both cases, you have
|
||||
to specify how to divide the physical link into simulated links and
|
||||
how to decide which simulated link to use for a given packet to be sent.
|
||||
|
||||
Unlike CBQ, HTB shapes traffic based on the Token Bucket Filter algorithm
|
||||
which does not depend on interface characteristics and so does not need to
|
||||
know the underlying bandwidth of the outgoing interface.
|
||||
|
||||
.SH SHAPING ALGORITHM
|
||||
Shaping works as documented in
|
||||
.B tc-tbf (8).
|
||||
|
||||
.SH CLASSIFICATION
|
||||
Within the one HRB instance many classes may exist. Each of these classes
|
||||
contains another qdisc, by default
|
||||
.BR tc-pfifo (8).
|
||||
|
||||
When enqueueing a packet, HTB starts at the root and uses various methods to
|
||||
determine which class should receive the data.
|
||||
|
||||
In the absence of uncommon configuration options, the process is rather easy.
|
||||
At each node we look for an instruction, and then go to the class the
|
||||
instruction refers us to. If the class found is a barren leaf-node (without
|
||||
children), we enqueue the packet there. If it is not yet a leaf node, we do
|
||||
the whole thing over again starting from that node.
|
||||
|
||||
The following actions are performed, in order at each node we visit, until one
|
||||
sends us to another node, or terminates the process.
|
||||
.TP
|
||||
(i)
|
||||
Consult filters attached to the class. If sent to a leafnode, we are done.
|
||||
Otherwise, restart.
|
||||
.TP
|
||||
(ii)
|
||||
If none of the above returned with an instruction, enqueue at this node.
|
||||
.P
|
||||
This algorithm makes sure that a packet always ends up somewhere, even while
|
||||
you are busy building your configuration.
|
||||
|
||||
.SH LINK SHARING ALGORITHM
|
||||
FIXME
|
||||
|
||||
.SH QDISC
|
||||
The root of a HTB qdisc class tree has the following parameters:
|
||||
|
||||
.TP
|
||||
parent major:minor | root
|
||||
This mandatory parameter determines the place of the HTB instance, either at the
|
||||
.B root
|
||||
of an interface or within an existing class.
|
||||
.TP
|
||||
handle major:
|
||||
Like all other qdiscs, the HTB can be assigned a handle. Should consist only
|
||||
of a major number, followed by a colon. Optional, but very useful if classes
|
||||
will be generated within this qdisc.
|
||||
.TP
|
||||
default minor-id
|
||||
Unclassified traffic gets sent to the class with this minor-id.
|
||||
|
||||
.SH CLASSES
|
||||
Classes have a host of parameters to configure their operation.
|
||||
|
||||
.TP
|
||||
parent major:minor
|
||||
Place of this class within the hierarchy. If attached directly to a qdisc
|
||||
and not to another class, minor can be omitted. Mandatory.
|
||||
.TP
|
||||
classid major:minor
|
||||
Like qdiscs, classes can be named. The major number must be equal to the
|
||||
major number of the qdisc to which it belongs. Optional, but needed if this
|
||||
class is going to have children.
|
||||
.TP
|
||||
prio priority
|
||||
In the round-robin process, classes with the lowest priority field are tried
|
||||
for packets first. Mandatory.
|
||||
|
||||
.TP
|
||||
rate rate
|
||||
Maximum rate this class and all its children are guaranteed. Mandatory.
|
||||
|
||||
.TP
|
||||
ceil rate
|
||||
Maximum rate at which a class can send, if its parent has bandwidth to spare.
|
||||
Defaults to the configured rate, which implies no borrowing
|
||||
|
||||
.TP
|
||||
burst bytes
|
||||
Amount of bytes that can be burst at
|
||||
.B ceil
|
||||
speed, in excess of the configured
|
||||
.B rate.
|
||||
Should be at least as high as the highest burst of all children.
|
||||
|
||||
.TP
|
||||
cburst bytes
|
||||
Amount of bytes that can be burst at 'infinite' speed, in other words, as fast
|
||||
as the interface can transmit them. For perfect evening out, should be equal to at most one average
|
||||
packet. Should be at least as high as the highest cburst of all children.
|
||||
|
||||
.SH NOTES
|
||||
Due to Unix timing constraints, the maximum ceil rate is not infinite and may in fact be quite low. On Intel,
|
||||
there are 100 timer events per second, the maximum rate is that rate at which 'burst' bytes are sent each timer tick.
|
||||
From this, the mininum burst size for a specified rate can be calculated. For i386, a 10mbit rate requires a 12 kilobyte
|
||||
burst as 100*12kb*8 equals 10mbit.
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
.P
|
||||
HTB website: http://luxik.cdi.cz/~devik/qos/htb/
|
||||
.SH AUTHOR
|
||||
Martin Devera <devik@cdi.cz>. This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,72 @@
|
|||
.TH PBFIFO 8 "10 January 2002" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
pfifo \- Packet limited First In, First Out queue
|
||||
.P
|
||||
bfifo \- Byte limited First In, First Out queue
|
||||
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... add pfifo
|
||||
.B [ limit
|
||||
packets
|
||||
.B ]
|
||||
.P
|
||||
.B tc qdisc ... add bfifo
|
||||
.B [ limit
|
||||
bytes
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
The pfifo and bfifo qdiscs are unadorned First In, First Out queues. They are the
|
||||
simplest queues possible and therefore have no overhead.
|
||||
.B pfifo
|
||||
constrains the queue size as measured in packets.
|
||||
.B bfifo
|
||||
does so as measured in bytes.
|
||||
|
||||
Like all non-default qdiscs, they maintain statistics. This might be a reason to prefer
|
||||
pfifo or bfifo over the default.
|
||||
|
||||
.SH ALGORITHM
|
||||
A list of packets is maintained, when a packet is enqueued it gets inserted at the tail of
|
||||
a list. When a packet needs to be sent out to the network, it is taken from the head of the list.
|
||||
|
||||
If the list is too long, no further packets are allowed on. This is called 'tail drop'.
|
||||
|
||||
.SH PARAMETERS
|
||||
.TP
|
||||
limit
|
||||
Maximum queue size. Specified in bytes for bfifo, in packets for pfifo. For pfifo, defaults
|
||||
to the interface txqueuelen, as specified with
|
||||
.BR ifconfig (8)
|
||||
or
|
||||
.BR ip (8).
|
||||
|
||||
For bfifo, it defaults to the txqueuelen multiplied by the interface MTU.
|
||||
|
||||
.SH OUTPUT
|
||||
The output of
|
||||
.B tc -s qdisc ls
|
||||
contains the limit, either in packets or in bytes, and the number of bytes
|
||||
and packets actually sent. An unsent and dropped packet only appears between braces
|
||||
and is not counted as 'Sent'.
|
||||
|
||||
In this example, the queue length is 100 packets, 45894 bytes were sent over 681 packets.
|
||||
No packets were dropped, and as the pfifo queue does not slow down packets, there were also no
|
||||
overlimits:
|
||||
.P
|
||||
.nf
|
||||
# tc -s qdisc ls dev eth0
|
||||
qdisc pfifo 8001: dev eth0 limit 100p
|
||||
Sent 45894 bytes 681 pkts (dropped 0, overlimits 0)
|
||||
.fi
|
||||
|
||||
If a backlog occurs, this is displayed as well.
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHORS
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>
|
||||
|
||||
This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
.TH PFIFO_FAST 8 "10 January 2002" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
pfifo_fast \- three-band first in, first out queue
|
||||
|
||||
.SH DESCRIPTION
|
||||
pfifo_fast is the default qdisc of each interface.
|
||||
|
||||
Whenever an interface is created, the pfifo_fast qdisc is automatically used
|
||||
as a queue. If another qdisc is attached, it preempts the default
|
||||
pfifo_fast, which automatically returns to function when an existing qdisc
|
||||
is detached.
|
||||
|
||||
In this sense this qdisc is magic, and unlike other qdiscs.
|
||||
|
||||
.SH ALGORITHM
|
||||
The algorithm is very similar to that of the classful
|
||||
.BR tc-prio (8)
|
||||
qdisc.
|
||||
.B pfifo_fast
|
||||
is like three
|
||||
.BR tc-pfifo (8)
|
||||
queues side by side, where packets can be enqueued in any of the three bands
|
||||
based on their Type of Service bits or assigned priority.
|
||||
|
||||
Not all three bands are dequeued simultaneously - as long as lower bands
|
||||
have traffic, higher bands are never dequeued. This can be used to
|
||||
prioritize interactive traffic or penalize 'lowest cost' traffic.
|
||||
|
||||
Each band can be txqueuelen packets long, as configured with
|
||||
.BR ifconfig (8)
|
||||
or
|
||||
.BR ip (8).
|
||||
Additional packets coming in are not enqueued but are instead dropped.
|
||||
|
||||
See
|
||||
.BR tc-prio (8)
|
||||
for complete details on how TOS bits are translated into bands.
|
||||
.SH PARAMETERS
|
||||
.TP
|
||||
txqueuelen
|
||||
The length of the three bands depends on the interface txqueuelen, as
|
||||
specified with
|
||||
.BR ifconfig (8)
|
||||
or
|
||||
.BR ip (8).
|
||||
|
||||
.SH BUGS
|
||||
Does not maintain statistics and does not show up in tc qdisc ls. This is because
|
||||
it is the automatic default in the absence of a configured qdisc.
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHORS
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>
|
||||
|
||||
This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,187 @@
|
|||
.TH PRIO 8 "16 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
PRIO \- Priority qdisc
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... dev
|
||||
dev
|
||||
.B ( parent
|
||||
classid
|
||||
.B | root) [ handle
|
||||
major:
|
||||
.B ] prio [ bands
|
||||
bands
|
||||
.B ] [ priomap
|
||||
band,band,band...
|
||||
.B ] [ estimator
|
||||
interval timeconstant
|
||||
.B ]
|
||||
|
||||
.SH DESCRIPTION
|
||||
The PRIO qdisc is a simple classful queueing discipline that contains
|
||||
an arbitrary number of classes of differing priority. The classes are
|
||||
dequeued in numerical descending order of priority. PRIO is a scheduler
|
||||
and never delays packets - it is a work-conserving qdisc, though the qdiscs
|
||||
contained in the classes may not be.
|
||||
|
||||
Very useful for lowering latency when there is no need for slowing down
|
||||
traffic.
|
||||
|
||||
.SH ALGORITHM
|
||||
On creation with 'tc qdisc add', a fixed number of bands is created. Each
|
||||
band is a class, although is not possible to add classes with 'tc qdisc
|
||||
add', the number of bands to be created must instead be specified on the
|
||||
commandline attaching PRIO to its root.
|
||||
|
||||
When dequeueing, band 0 is tried first and only if it did not deliver a
|
||||
packet does PRIO try band 1, and so onwards. Maximum reliability packets
|
||||
should therefore go to band 0, minimum delay to band 1 and the rest to band
|
||||
2.
|
||||
|
||||
As the PRIO qdisc itself will have minor number 0, band 0 is actually
|
||||
major:1, band 1 is major:2, etc. For major, substitute the major number
|
||||
assigned to the qdisc on 'tc qdisc add' with the
|
||||
.B handle
|
||||
parameter.
|
||||
|
||||
.SH CLASSIFICATION
|
||||
Three methods are available to PRIO to determine in which band a packet will
|
||||
be enqueued.
|
||||
.TP
|
||||
From userspace
|
||||
A process with sufficient privileges can encode the destination class
|
||||
directly with SO_PRIORITY, see
|
||||
.BR tc(7).
|
||||
.TP
|
||||
with a tc filter
|
||||
A tc filter attached to the root qdisc can point traffic directly to a class
|
||||
.TP
|
||||
with the priomap
|
||||
Based on the packet priority, which in turn is derived from the Type of
|
||||
Service assigned to the packet.
|
||||
.P
|
||||
Only the priomap is specific to this qdisc.
|
||||
.SH QDISC PARAMETERS
|
||||
.TP
|
||||
bands
|
||||
Number of bands. If changed from the default of 3,
|
||||
.B priomap
|
||||
must be updated as well.
|
||||
.TP
|
||||
priomap
|
||||
The priomap maps the priority of
|
||||
a packet to a class. The priority can either be set directly from userspace,
|
||||
or be derived from the Type of Service of the packet.
|
||||
|
||||
Determines how packet priorities, as assigned by the kernel, map to
|
||||
bands. Mapping occurs based on the TOS octet of the packet, which looks like
|
||||
this:
|
||||
|
||||
.nf
|
||||
0 1 2 3 4 5 6 7
|
||||
+---+---+---+---+---+---+---+---+
|
||||
| | | |
|
||||
|PRECEDENCE | TOS |MBZ|
|
||||
| | | |
|
||||
+---+---+---+---+---+---+---+---+
|
||||
.fi
|
||||
|
||||
The four TOS bits (the 'TOS field') are defined as:
|
||||
|
||||
.nf
|
||||
Binary Decimcal Meaning
|
||||
-----------------------------------------
|
||||
1000 8 Minimize delay (md)
|
||||
0100 4 Maximize throughput (mt)
|
||||
0010 2 Maximize reliability (mr)
|
||||
0001 1 Minimize monetary cost (mmc)
|
||||
0000 0 Normal Service
|
||||
.fi
|
||||
|
||||
As there is 1 bit to the right of these four bits, the actual value of the
|
||||
TOS field is double the value of the TOS bits. Tcpdump -v -v shows you the
|
||||
value of the entire TOS field, not just the four bits. It is the value you
|
||||
see in the first column of this table:
|
||||
|
||||
.nf
|
||||
TOS Bits Means Linux Priority Band
|
||||
------------------------------------------------------------
|
||||
0x0 0 Normal Service 0 Best Effort 1
|
||||
0x2 1 Minimize Monetary Cost 1 Filler 2
|
||||
0x4 2 Maximize Reliability 0 Best Effort 1
|
||||
0x6 3 mmc+mr 0 Best Effort 1
|
||||
0x8 4 Maximize Throughput 2 Bulk 2
|
||||
0xa 5 mmc+mt 2 Bulk 2
|
||||
0xc 6 mr+mt 2 Bulk 2
|
||||
0xe 7 mmc+mr+mt 2 Bulk 2
|
||||
0x10 8 Minimize Delay 6 Interactive 0
|
||||
0x12 9 mmc+md 6 Interactive 0
|
||||
0x14 10 mr+md 6 Interactive 0
|
||||
0x16 11 mmc+mr+md 6 Interactive 0
|
||||
0x18 12 mt+md 4 Int. Bulk 1
|
||||
0x1a 13 mmc+mt+md 4 Int. Bulk 1
|
||||
0x1c 14 mr+mt+md 4 Int. Bulk 1
|
||||
0x1e 15 mmc+mr+mt+md 4 Int. Bulk 1
|
||||
.fi
|
||||
|
||||
The second column contains the value of the relevant
|
||||
four TOS bits, followed by their translated meaning. For example, 15 stands
|
||||
for a packet wanting Minimal Montetary Cost, Maximum Reliability, Maximum
|
||||
Throughput AND Minimum Delay.
|
||||
|
||||
The fourth column lists the way the Linux kernel interprets the TOS bits, by
|
||||
showing to which Priority they are mapped.
|
||||
|
||||
The last column shows the result of the default priomap. On the commandline,
|
||||
the default priomap looks like this:
|
||||
|
||||
1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1
|
||||
|
||||
This means that priority 4, for example, gets mapped to band number 1.
|
||||
The priomap also allows you to list higher priorities (> 7) which do not
|
||||
correspond to TOS mappings, but which are set by other means.
|
||||
|
||||
This table from RFC 1349 (read it for more details) explains how
|
||||
applications might very well set their TOS bits:
|
||||
|
||||
.nf
|
||||
TELNET 1000 (minimize delay)
|
||||
FTP
|
||||
Control 1000 (minimize delay)
|
||||
Data 0100 (maximize throughput)
|
||||
|
||||
TFTP 1000 (minimize delay)
|
||||
|
||||
SMTP
|
||||
Command phase 1000 (minimize delay)
|
||||
DATA phase 0100 (maximize throughput)
|
||||
|
||||
Domain Name Service
|
||||
UDP Query 1000 (minimize delay)
|
||||
TCP Query 0000
|
||||
Zone Transfer 0100 (maximize throughput)
|
||||
|
||||
NNTP 0001 (minimize monetary cost)
|
||||
|
||||
ICMP
|
||||
Errors 0000
|
||||
Requests 0000 (mostly)
|
||||
Responses <same as request> (mostly)
|
||||
.fi
|
||||
|
||||
|
||||
.SH CLASSES
|
||||
PRIO classes cannot be configured further - they are automatically created
|
||||
when the PRIO qdisc is attached. Each class however can contain yet a
|
||||
further qdisc.
|
||||
|
||||
.SH BUGS
|
||||
Large amounts of traffic in the lower bands can cause starvation of higher
|
||||
bands. Can be prevented by attaching a shaper (for example,
|
||||
.BR tc-tbf(8)
|
||||
to these bands to make sure they cannot dominate the link.
|
||||
|
||||
.SH AUTHORS
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>, J Hadi Salim
|
||||
<hadi@cyberus.ca>. This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,131 @@
|
|||
.TH RED 8 "13 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
red \- Random Early Detection
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... red
|
||||
.B limit
|
||||
bytes
|
||||
.B min
|
||||
bytes
|
||||
.B max
|
||||
bytes
|
||||
.B avpkt
|
||||
bytes
|
||||
.B burst
|
||||
packets
|
||||
.B [ ecn ] [ bandwidth
|
||||
rate
|
||||
.B ] probability
|
||||
chance
|
||||
|
||||
.SH DESCRIPTION
|
||||
Random Early Detection is a classless qdisc which manages its queue size
|
||||
smartly. Regular queues simply drop packets from the tail when they are
|
||||
full, which may not be the optimal behaviour. RED also performs tail drop,
|
||||
but does so in a more gradual way.
|
||||
|
||||
Once the queue hits a certain average length, packets enqueued have a
|
||||
configurable chance of being marked (which may mean dropped). This chance
|
||||
increases linearly up to a point called the
|
||||
.B max
|
||||
average queue length, although the queue might get bigger.
|
||||
|
||||
This has a host of benefits over simple taildrop, while not being processor
|
||||
intensive. It prevents synchronous retransmits after a burst in traffic,
|
||||
which cause further retransmits, etc.
|
||||
|
||||
The goal is the have a small queue size, which is good for interactivity
|
||||
while not disturbing TCP/IP traffic with too many sudden drops after a burst
|
||||
of traffic.
|
||||
|
||||
Depending on if ECN is configured, marking either means dropping or
|
||||
purely marking a packet as overlimit.
|
||||
.SH ALGORITHM
|
||||
The average queue size is used for determining the marking
|
||||
probability. This is calculated using an Exponential Weighted Moving
|
||||
Average, which can be more or less sensitive to bursts.
|
||||
|
||||
When the average queue size is below
|
||||
.B min
|
||||
bytes, no packet will ever be marked. When it exceeds
|
||||
.B min,
|
||||
the probability of doing so climbs linearly up
|
||||
to
|
||||
.B probability,
|
||||
until the average queue size hits
|
||||
.B max
|
||||
bytes. Because
|
||||
.B probability
|
||||
is normally not set to 100%, the queue size might
|
||||
conceivably rise above
|
||||
.B max
|
||||
bytes, so the
|
||||
.B limit
|
||||
parameter is provided to set a hard maximum for the size of the queue.
|
||||
|
||||
.SH PARAMETERS
|
||||
.TP
|
||||
min
|
||||
Average queue size at which marking becomes a possibility.
|
||||
.TP
|
||||
max
|
||||
At this average queue size, the marking probability is maximal. Should be at
|
||||
least twice
|
||||
.B min
|
||||
to prevent synchronous retransmits, higher for low
|
||||
.B min.
|
||||
.TP
|
||||
probability
|
||||
Maximum probability for marking, specified as a floating point
|
||||
number from 0.0 to 1.0. Suggested values are 0.01 or 0.02 (1 or 2%,
|
||||
respectively).
|
||||
.TP
|
||||
limit
|
||||
Hard limit on the real (not average) queue size in bytes. Further packets
|
||||
are dropped. Should be set higher than max+burst. It is advised to set this
|
||||
a few times higher than
|
||||
.B max.
|
||||
.TP
|
||||
burst
|
||||
Used for determining how fast the average queue size is influenced by the
|
||||
real queue size. Larger values make the calculation more sluggish, allowing
|
||||
longer bursts of traffic before marking starts. Real life experiments
|
||||
support the following guideline: (min+min+max)/(3*avpkt).
|
||||
.TP
|
||||
avpkt
|
||||
Specified in bytes. Used with burst to determine the time constant for
|
||||
average queue size calculations. 1000 is a good value.
|
||||
.TP
|
||||
bandwidth
|
||||
This rate is used for calculating the average queue size after some
|
||||
idle time. Should be set to the bandwidth of your interface. Does not mean
|
||||
that RED will shape for you! Optional.
|
||||
.TP
|
||||
ecn
|
||||
As mentioned before, RED can either 'mark' or 'drop'. Explicit Congestion
|
||||
Notification allows RED to notify remote hosts that their rate exceeds the
|
||||
amount of bandwidth available. Non-ECN capable hosts can only be notified by
|
||||
dropping a packet. If this parameter is specified, packets which indicate
|
||||
that their hosts honor ECN will only be marked and not dropped, unless the
|
||||
queue size hits
|
||||
.B limit
|
||||
bytes. Needs a tc binary with RED support compiled in. Recommended.
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH SOURCES
|
||||
.TP
|
||||
o
|
||||
Floyd, S., and Jacobson, V., Random Early Detection gateways for
|
||||
Congestion Avoidance. http://www.aciri.org/floyd/papers/red/red.html
|
||||
.TP
|
||||
o
|
||||
Some changes to the algorithm by Alexey N. Kuznetsov.
|
||||
|
||||
.SH AUTHORS
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>, Alexey Makarenko
|
||||
<makar@phoenix.kharkov.ua>, J Hadi Salim <hadi@nortelnetworks.com>.
|
||||
This manpage maintained by bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
.TH TC 8 "8 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
sfq \- Stochastic Fairness Queueing
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... perturb
|
||||
seconds
|
||||
.B quantum
|
||||
bytes
|
||||
|
||||
.SH DESCRIPTION
|
||||
|
||||
Stochastic Fairness Queueing is a classless queueing discipline available for
|
||||
traffic control with the
|
||||
.BR tc (8)
|
||||
command.
|
||||
|
||||
SFQ does not shape traffic but only schedules the transmission of packets, based on 'flows'.
|
||||
The goal is to ensure fairness so that each flow is able to send data in turn, thus preventing
|
||||
any single flow from drowning out the rest.
|
||||
|
||||
This may in fact have some effect in mitigating a Denial of Service attempt.
|
||||
|
||||
SFQ is work-conserving and therefore always delivers a packet if it has one available.
|
||||
.SH ALGORITHM
|
||||
On enqueueing, each packet is assigned to a hash bucket, based on
|
||||
.TP
|
||||
(i)
|
||||
Source address
|
||||
.TP
|
||||
(ii)
|
||||
Destination address
|
||||
.TP
|
||||
(iii)
|
||||
Source port
|
||||
.P
|
||||
If these are available. SFQ knows about ipv4 and ipv6 and also UDP, TCP and ESP.
|
||||
Packets with other protocols are hashed based on the 32bits representation of their
|
||||
destination and the socket they belong to. A flow corresponds mostly to a TCP/IP
|
||||
connection.
|
||||
|
||||
Each of these buckets should represent a unique flow. Because multiple flows may
|
||||
get hashed to the same bucket, the hashing algorithm is perturbed at configurable
|
||||
intervals so that the unfairness lasts only for a short while. Perturbation may
|
||||
however cause some inadvertent packet reordering to occur.
|
||||
|
||||
When dequeuing, each hashbucket with data is queried in a round robin fashion.
|
||||
|
||||
The compile time maximum length of the SFQ is 128 packets, which can be spread over
|
||||
at most 128 buckets of 1024 available. In case of overflow, tail-drop is performed
|
||||
on the fullest bucket, thus maintaining fairness.
|
||||
|
||||
.SH PARAMETERS
|
||||
.TP
|
||||
perturb
|
||||
Interval in seconds for queue algorithm perturbation. Defaults to 0, which means that
|
||||
no perturbation occurs. Do not set too low for each perturbation may cause some packet
|
||||
reordering. Advised value: 10
|
||||
.TP
|
||||
quantum
|
||||
Amount of bytes a flow is allowed to dequeue during a round of the round robin process.
|
||||
Defaults to the MTU of the interface which is also the advised value and the minimum value.
|
||||
|
||||
.SH EXAMPLE & USAGE
|
||||
|
||||
To attach to device ppp0:
|
||||
.P
|
||||
# tc qdisc add dev ppp0 root sfq perturb 10
|
||||
.P
|
||||
Please note that SFQ, like all non-shaping (work-conserving) qdiscs, is only useful
|
||||
if it owns the queue.
|
||||
This is the case when the link speed equals the actually available bandwidth. This holds
|
||||
for regular phone modems, ISDN connections and direct non-switched ethernet links.
|
||||
.P
|
||||
Most often, cable modems and DSL devices do not fall into this category. The same holds
|
||||
for when connected to a switch and trying to send data to a congested segment also
|
||||
connected to the switch.
|
||||
.P
|
||||
In this case, the effective queue does not reside within Linux and is therefore not
|
||||
available for scheduling.
|
||||
.P
|
||||
Embed SFQ in a classful qdisc to make sure it owns the queue.
|
||||
|
||||
.SH SOURCE
|
||||
.TP
|
||||
o
|
||||
Paul E. McKenney "Stochastic Fairness Queuing",
|
||||
IEEE INFOCOMM'90 Proceedings, San Francisco, 1990.
|
||||
|
||||
.TP
|
||||
o
|
||||
Paul E. McKenney "Stochastic Fairness Queuing",
|
||||
"Interworking: Research and Experience", v.2, 1991, p.113-131.
|
||||
|
||||
.TP
|
||||
o
|
||||
See also:
|
||||
M. Shreedhar and George Varghese "Efficient Fair
|
||||
Queuing using Deficit Round Robin", Proc. SIGCOMM 95.
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHOR
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by
|
||||
bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,138 @@
|
|||
.TH TC 8 "13 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
tbf \- Token Bucket Filter
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc ... tbf rate
|
||||
rate
|
||||
.B burst
|
||||
bytes/cell
|
||||
.B ( latency
|
||||
ms
|
||||
.B | limit
|
||||
bytes
|
||||
.B ) [ mpu
|
||||
bytes
|
||||
.B [ peakrate
|
||||
rate
|
||||
.B mtu
|
||||
bytes/cell
|
||||
.B ] ]
|
||||
.P
|
||||
burst is also known as buffer and maxburst. mtu is also known as minburst.
|
||||
.SH DESCRIPTION
|
||||
|
||||
The Token Bucket Filter is a classless queueing discipline available for
|
||||
traffic control with the
|
||||
.BR tc (8)
|
||||
command.
|
||||
|
||||
TBF is a pure shaper and never schedules traffic. It is non-work-conserving and may throttle
|
||||
itself, although packets are available, to ensure that the configured rate is not exceeded.
|
||||
On all platforms except for Alpha,
|
||||
it is able to shape up to 1mbit/s of normal traffic with ideal minimal burstiness,
|
||||
sending out data exactly at the configured rates.
|
||||
|
||||
Much higher rates are possible but at the cost of losing the minimal burstiness. In that
|
||||
case, data is on average dequeued at the configured rate but may be sent much faster at millisecond
|
||||
timescales. Because of further queues living in network adaptors, this is often not a problem.
|
||||
|
||||
Kernels with a higher 'HZ' can achieve higher rates with perfect burstiness. On Alpha, HZ is ten
|
||||
times higher, leading to a 10mbit/s limit to perfection. These calculations hold for packets of on
|
||||
average 1000 bytes.
|
||||
|
||||
.SH ALGORITHM
|
||||
As the name implies, traffic is filtered based on the expenditure of
|
||||
.B tokens.
|
||||
Tokens roughly correspond to bytes, with the additional constraint that each packet consumes
|
||||
some tokens, no matter how small it is. This reflects the fact that even a zero-sized packet occupies
|
||||
the link for some time.
|
||||
|
||||
On creation, the TBF is stocked with tokens which correspond to the amount of traffic that can be burst
|
||||
in one go. Tokens arrive at a steady rate, until the bucket is full.
|
||||
|
||||
If no tokens are available, packets are queued, up to a configured limit. The TBF now
|
||||
calculates the token deficit, and throttles until the first packet in the queue can be sent.
|
||||
|
||||
If it is not acceptable to burst out packets at maximum speed, a peakrate can be configured
|
||||
to limit the speed at which the bucket empties. This peakrate is implemented as a second TBF
|
||||
with a very small bucket, so that it doesn't burst.
|
||||
|
||||
To achieve perfection, the second bucket may contain only a single packet, which leads to
|
||||
the earlier mentioned 1mbit/s limit.
|
||||
|
||||
This limit is caused by the fact that the kernel can only throttle for at minimum 1 'jiffy', which depends
|
||||
on HZ as 1/HZ. For perfect shaping, only a single packet can get sent per jiffy - for HZ=100, this means 100
|
||||
packets of on average 1000 bytes each, which roughly corresponds to 1mbit/s.
|
||||
|
||||
.SH PARAMETERS
|
||||
See
|
||||
.BR tc (8)
|
||||
for how to specify the units of these values.
|
||||
.TP
|
||||
limit or latency
|
||||
Limit is the number of bytes that can be queued waiting for tokens to become
|
||||
available. You can also specify this the other way around by setting the
|
||||
latency parameter, which specifies the maximum amount of time a packet can
|
||||
sit in the TBF. The latter calculation takes into account the size of the
|
||||
bucket, the rate and possibly the peakrate (if set). These two parameters
|
||||
are mutually exclusive.
|
||||
.TP
|
||||
burst
|
||||
Also known as buffer or maxburst.
|
||||
Size of the bucket, in bytes. This is the maximum amount of bytes that tokens can be available for instantaneously.
|
||||
In general, larger shaping rates require a larger buffer. For 10mbit/s on Intel, you need at least 10kbyte buffer
|
||||
if you want to reach your configured rate!
|
||||
|
||||
If your buffer is too small, packets may be dropped because more tokens arrive per timer tick than fit in your bucket.
|
||||
The minimum buffer size can be calculated by dividing the rate by HZ.
|
||||
|
||||
Token usage calculations are performed using a table which by default has a resolution of 8 packets.
|
||||
This resolution can be changed by specifying the
|
||||
.B cell
|
||||
size with the burst. For example, to specify a 6000 byte buffer with a 16
|
||||
byte cell size, set a burst of 6000/16. You will probably never have to set
|
||||
this. Must be an integral power of 2.
|
||||
.TP
|
||||
mpu
|
||||
A zero-sized packet does not use zero bandwidth. For ethernet, no packet uses less than 64 bytes. The Minimum Packet Unit
|
||||
determines the minimal token usage (specified in bytes) for a packet. Defaults to zero.
|
||||
.TP
|
||||
rate
|
||||
The speed knob. See remarks above about limits! See
|
||||
.BR tc (8)
|
||||
for units.
|
||||
.PP
|
||||
Furthermore, if a peakrate is desired, the following parameters are available:
|
||||
|
||||
.TP
|
||||
peakrate
|
||||
Maximum depletion rate of the bucket. Limited to 1mbit/s on Intel, 10mbit/s on Alpha. The peakrate does
|
||||
not need to be set, it is only necessary if perfect millisecond timescale shaping is required.
|
||||
|
||||
.TP
|
||||
mtu/minburst
|
||||
Specifies the size of the peakrate bucket. For perfect accuracy, should be set to the MTU of the interface.
|
||||
If a peakrate is needed, but some burstiness is acceptable, this size can be raised. A 3000 byte minburst
|
||||
allows around 3mbit/s of peakrate, given 1000 byte packets.
|
||||
|
||||
Like the regular burstsize you can also specify a
|
||||
.B cell
|
||||
size.
|
||||
.SH EXAMPLE & USAGE
|
||||
|
||||
To attach a TBF with a sustained maximum rate of 0.5mbit/s, a peakrate of 1.0mbit/s,
|
||||
a 5kilobyte buffer, with a pre-bucket queue size limit calculated so the TBF causes
|
||||
at most 70ms of latency, with perfect peakrate behaviour, issue:
|
||||
.P
|
||||
# tc qdisc add dev eth0 root tbf rate 0.5mbit \\
|
||||
burst 5kb latency 70ms peakrate 1mbit \\
|
||||
minburst 1540
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR tc (8)
|
||||
|
||||
.SH AUTHOR
|
||||
Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by
|
||||
bert hubert <ahu@ds9a.nl>
|
||||
|
||||
|
||||
348
man/man8/tc.8
348
man/man8/tc.8
|
|
@ -0,0 +1,348 @@
|
|||
.TH TC 8 "16 December 2001" "iproute2" "Linux"
|
||||
.SH NAME
|
||||
tc \- show / manipulate traffic control settings
|
||||
.SH SYNOPSIS
|
||||
.B tc qdisc [ add | change | replace | link ] dev
|
||||
DEV
|
||||
.B
|
||||
[ parent
|
||||
qdisc-id
|
||||
.B | root ]
|
||||
.B [ handle
|
||||
qdisc-id ] qdisc
|
||||
[ qdisc specific parameters ]
|
||||
.P
|
||||
|
||||
.B tc class [ add | change | replace ] dev
|
||||
DEV
|
||||
.B parent
|
||||
qdisc-id
|
||||
.B [ classid
|
||||
class-id ] qdisc
|
||||
[ qdisc specific parameters ]
|
||||
.P
|
||||
|
||||
.B tc filter [ add | change | replace ] dev
|
||||
DEV
|
||||
.B [ parent
|
||||
qdisc-id
|
||||
.B | root ] protocol
|
||||
protocol
|
||||
.B prio
|
||||
priority filtertype
|
||||
[ filtertype specific parameters ]
|
||||
.B flowid
|
||||
flow-id
|
||||
|
||||
.B tc [-s | -d ] qdisc show [ dev
|
||||
DEV
|
||||
.B ]
|
||||
.P
|
||||
.B tc [-s | -d ] class show dev
|
||||
DEV
|
||||
.P
|
||||
.B tc filter show dev
|
||||
DEV
|
||||
|
||||
.SH DESCRIPTION
|
||||
.B Tc
|
||||
is used to configure Traffic Control in the Linux kernel. Traffic Control consists
|
||||
of the following:
|
||||
|
||||
.TP
|
||||
SHAPING
|
||||
When traffic is shaped, its rate of transmission is under control. Shaping may
|
||||
be more than lowering the available bandwidth - it is also used to smooth out
|
||||
bursts in traffic for better network behaviour. Shaping occurs on egress.
|
||||
|
||||
.TP
|
||||
SCHEDULING
|
||||
By scheduling the transmission of packets it is possible to improve interactivity
|
||||
for traffic that needs it while still guaranteeing bandwidth to bulk transfers. Reordering
|
||||
is also called prioritizing, and happens only on egress.
|
||||
|
||||
.TP
|
||||
POLICING
|
||||
Where shaping deals with transmission of traffic, policing pertains to traffic
|
||||
arriving. Policing thus occurs on ingress.
|
||||
|
||||
.TP
|
||||
DROPPING
|
||||
Traffic exceeding a set bandwidth may also be dropped forthwith, both on
|
||||
ingress and on egress.
|
||||
|
||||
.P
|
||||
Processing of traffic is controlled by three kinds of objects: qdiscs,
|
||||
classes and filters.
|
||||
|
||||
.SH QDISCS
|
||||
.B qdisc
|
||||
is short for 'queueing discipline' and it is elementary to
|
||||
understanding traffic control. Whenever the kernel needs to send a
|
||||
packet to an interface, it is
|
||||
.B enqueued
|
||||
to the qdisc configured for that interface. Immediately afterwards, the kernel
|
||||
tries to get as many packets as possible from the qdisc, for giving them
|
||||
to the network adaptor driver.
|
||||
|
||||
A simple QDISC is the 'pfifo' one, which does no processing at all and is a pure
|
||||
First In, First Out queue. It does however store traffic when the network interface
|
||||
can't handle it momentarily.
|
||||
|
||||
.SH CLASSES
|
||||
Some qdiscs can contain classes, which contain further qdiscs - traffic may
|
||||
then be enqueued in any of the inner qdiscs, which are within the
|
||||
.B classes.
|
||||
When the kernel tries to dequeue a packet from such a
|
||||
.B classful qdisc
|
||||
it can come from any of the classes. A qdisc may for example prioritize
|
||||
certain kinds of traffic by trying to dequeue from certain classes
|
||||
before others.
|
||||
|
||||
.SH FILTERS
|
||||
A
|
||||
.B filter
|
||||
is used by a classful qdisc to determine in which class a packet will
|
||||
be enqueued. Whenever traffic arrives at a class with subclasses, it needs
|
||||
to be classified. Various methods may be employed to do so, one of these
|
||||
are the filters. All filters attached to the class are called, until one of
|
||||
them returns with a verdict. If no verdict was made, other criteria may be
|
||||
available. This differs per qdisc.
|
||||
|
||||
It is important to notice that filters reside
|
||||
.B within
|
||||
qdiscs - they are not masters of what happens.
|
||||
|
||||
.SH CLASSLESS QDISCS
|
||||
The classless qdiscs are:
|
||||
.TP
|
||||
[p|b]fifo
|
||||
Simplest usable qdisc, pure First In, First Out behaviour. Limited in
|
||||
packets or in bytes.
|
||||
.TP
|
||||
pfifo_fast
|
||||
Standard qdisc for 'Advanced Router' enabled kernels. Consists of a three-band
|
||||
queue which honors Type of Service flags, as well as the priority that may be
|
||||
assigned to a packet.
|
||||
.TP
|
||||
red
|
||||
Random Early Detection simulates physical congestion by randomly dropping
|
||||
packets when nearing configured bandwidth allocation. Well suited to very
|
||||
large bandwidth applications.
|
||||
.TP
|
||||
sfq
|
||||
Stochastic Fairness Queueing reorders queued traffic so each 'session'
|
||||
gets to send a packet in turn.
|
||||
.TP
|
||||
tbf
|
||||
The Token Bucket Filter is suited for slowing traffic down to a precisely
|
||||
configured rate. Scales well to large bandwidths.
|
||||
.SH CONFIGURING CLASSLESS QDISCS
|
||||
In the absence of classful qdiscs, classless qdiscs can only be attached at
|
||||
the root of a device. Full syntax:
|
||||
.P
|
||||
.B tc qdisc add dev
|
||||
DEV
|
||||
.B root
|
||||
QDISC QDISC-PARAMETERS
|
||||
|
||||
To remove, issue
|
||||
.P
|
||||
.B tc qdisc del dev
|
||||
DEV
|
||||
.B root
|
||||
|
||||
The
|
||||
.B pfifo_fast
|
||||
qdisc is the automatic default in the absence of a configured qdisc.
|
||||
|
||||
.SH CLASSFUL QDISCS
|
||||
The classful qdiscs are:
|
||||
.TP
|
||||
CBQ
|
||||
Class Based Queueing implements a rich linksharing hierarchy of classes.
|
||||
It contains shaping elements as well as prioritizing capabilities. Shaping is
|
||||
performed using link idle time calculations based on average packet size and
|
||||
underlying link bandwidth. The latter may be ill-defined for some interfaces.
|
||||
.TP
|
||||
HTB
|
||||
The Hierarchy Token Bucket implements a rich linksharing hierarchy of
|
||||
classes with an emphasis on conforming to existing practices. HTB facilitates
|
||||
guaranteeing bandwidth to classes, while also allowing specification of upper
|
||||
limits to inter-class sharing. It contains shaping elements, based on TBF and
|
||||
can prioritize classes.
|
||||
.TP
|
||||
PRIO
|
||||
The PRIO qdisc is a non-shaping container for a configurable number of
|
||||
classes which are dequeued in order. This allows for easy prioritization
|
||||
of traffic, where lower classes are only able to send if higher ones have
|
||||
no packets available. To facilitate configuration, Type Of Service bits are
|
||||
honored by default.
|
||||
.SH THEORY OF OPERATION
|
||||
Classes form a tree, where each class has a single parent.
|
||||
A class may have multiple children. Some qdiscs allow for runtime addition
|
||||
of classes (CBQ, HTB) while others (PRIO) are created with a static number of
|
||||
children.
|
||||
|
||||
Qdiscs which allow dynamic addition of classes can have zero or more
|
||||
subclasses to which traffic may be enqueued.
|
||||
|
||||
Furthermore, each class contains a
|
||||
.B leaf qdisc
|
||||
which by default has
|
||||
.B pfifo
|
||||
behaviour though another qdisc can be attached in place. This qdisc may again
|
||||
contain classes, but each class can have only one leaf qdisc.
|
||||
|
||||
When a packet enters a classful qdisc it can be
|
||||
.B classified
|
||||
to one of the classes within. Three criteria are available, although not all
|
||||
qdiscs will use all three:
|
||||
.TP
|
||||
tc filters
|
||||
If tc filters are attached to a class, they are consulted first
|
||||
for relevant instructions. Filters can match on all fields of a packet header,
|
||||
as well as on the firewall mark applied by ipchains or iptables. See
|
||||
.BR tc-filters (8).
|
||||
.TP
|
||||
Type of Service
|
||||
Some qdiscs have built in rules for classifying packets based on the TOS field.
|
||||
.TP
|
||||
skb->priority
|
||||
Userspace programs can encode a class-id in the 'skb->priority' field using
|
||||
the SO_PRIORITY option.
|
||||
.P
|
||||
Each node within the tree can have its own filters but higher level filters
|
||||
may also point directly to lower classes.
|
||||
|
||||
If classification did not succeed, packets are enqueued to the leaf qdisc
|
||||
attached to that class. Check qdisc specific manpages for details, however.
|
||||
|
||||
.SH NAMING
|
||||
All qdiscs, classes and filters have IDs, which can either be specified
|
||||
or be automatically assigned.
|
||||
|
||||
IDs consist of a major number and a minor number, separated by a colon.
|
||||
|
||||
.TP
|
||||
QDISCS
|
||||
A qdisc, which potentially can have children,
|
||||
gets assigned a major number, called a 'handle', leaving the minor
|
||||
number namespace available for classes. The handle is expressed as '10:'.
|
||||
It is customary to explicitly assign a handle to qdiscs expected to have
|
||||
children.
|
||||
|
||||
.TP
|
||||
CLASSES
|
||||
Classes residing under a qdisc share their qdisc major number, but each have
|
||||
a separate minor number called a 'classid' that has no relation to their
|
||||
parent classes, only to their parent qdisc. The same naming custom as for
|
||||
qdiscs applies.
|
||||
|
||||
.TP
|
||||
FILTERS
|
||||
Filters have a three part ID, which is only needed when using a hashed
|
||||
filter hierarchy, for which see
|
||||
.BR tc-filters (8).
|
||||
.SH UNITS
|
||||
All parameters accept a floating point number, possibly followed by a unit.
|
||||
.P
|
||||
Bandwidths or rates can be specified in:
|
||||
.TP
|
||||
kbps
|
||||
Kilobytes per second
|
||||
.TP
|
||||
mbps
|
||||
Megabytes per second
|
||||
.TP
|
||||
kbit
|
||||
Kilobits per second
|
||||
.TP
|
||||
mbit
|
||||
Megabits per second
|
||||
.TP
|
||||
bps or a bare number
|
||||
Bytes per second
|
||||
.P
|
||||
Amounts of data can be specified in:
|
||||
.TP
|
||||
kb or k
|
||||
Kilobytes
|
||||
.TP
|
||||
mb or m
|
||||
Megabytes
|
||||
.TP
|
||||
mbit
|
||||
Megabits
|
||||
.TP
|
||||
kbit
|
||||
Kilobits
|
||||
.TP
|
||||
b or a bare number
|
||||
Bytes.
|
||||
.P
|
||||
Lengths of time can be specified in:
|
||||
.TP
|
||||
s, sec or secs
|
||||
Whole seconds
|
||||
.TP
|
||||
ms, msec or msecs
|
||||
Milliseconds
|
||||
.TP
|
||||
us, usec, usecs or a bare number
|
||||
Microseconds.
|
||||
|
||||
.SH TC COMMANDS
|
||||
The following commands are available for qdiscs, classes and filter:
|
||||
.TP
|
||||
add
|
||||
Add a qdisc, class or filter to a node. For all entities, a
|
||||
.B parent
|
||||
must be passed, either by passing its ID or by attaching directly to the root of a device.
|
||||
When creating a qdisc or a filter, it can be named with the
|
||||
.B handle
|
||||
parameter. A class is named with the
|
||||
.B classid
|
||||
parameter.
|
||||
|
||||
.TP
|
||||
remove
|
||||
A qdisc can be removed by specifying its handle, which may also be 'root'. All subclasses and their leaf qdiscs
|
||||
are automatically deleted, as well as any filters attached to them.
|
||||
|
||||
.TP
|
||||
change
|
||||
Some entities can be modified 'in place'. Shares the syntax of 'add', with the exception
|
||||
that the handle cannot be changed and neither can the parent. In other words,
|
||||
.B
|
||||
change
|
||||
cannot move a node.
|
||||
|
||||
.TP
|
||||
replace
|
||||
Performs a nearly atomic remove/add on an existing node id. If the node does not exist yet
|
||||
it is created.
|
||||
|
||||
.TP
|
||||
link
|
||||
Only available for qdiscs and performs a replace where the node
|
||||
must exist already.
|
||||
|
||||
|
||||
.SH HISTORY
|
||||
.B tc
|
||||
was written by Alexey N. Kuznetsov and added in Linux 2.2.
|
||||
.SH SEE ALSO
|
||||
.BR tc-cbq (8),
|
||||
.BR tc-htb (8),
|
||||
.BR tc-sfq (8),
|
||||
.BR tc-red (8),
|
||||
.BR tc-tbf (8),
|
||||
.BR tc-pfifo (8),
|
||||
.BR tc-bfifo (8),
|
||||
.BR tc-pfifo_fast (8),
|
||||
.BR tc-filters (8)
|
||||
|
||||
.SH AUTHOR
|
||||
Manpage maintained by bert hubert (ahu@ds9a.nl)
|
||||
|
||||
Loading…
Reference in New Issue