An Introduction to Tornado Codes» Deutsche Version
Tornado codes belong to a new class of error-correcting codes for erasure channels, based on the construction of randomly connected irregular bipartite graphs. Using a carefully chosen graph structure, tornado codes can achieve nearly optimal efficiency, and with linear complexity for en- and decoding and rather simple and fast algorithms they are predestined for the encoding of large amounts of data, and as an alternative to the classical ARQ concept as used on the Internet.
This paper starts with a short introduction to error-correcting codes and then continues with explaining the principal idea and construction of tornado codes. Next, the decoding step is analyzed, and from that regular and irregular graphs with nearly optimal reception efficiency are derived. After a short comparison between tornado codes and Reed-Solomon codes – the most commonly used error-correcting codes -, an own implementation for tornado codes is outlined. The last two chapters present possible applications for tornado codes and a short outlook to future developments.
Introduction to Error-Correcting Codes
Every message transmitted from a sender to some recipient is inevitably affected by noise within the channel the message passes through. Noise has different causes, such as thermal noise within the channel, or interference with other communication, and manifests in errors within the delivered message . Naturally, the aim of any communication is reliable delivery of information, which is equivalent to minimization of errors within the message. There is different ways to accomplish such goal. Directly, one could reduce the amount of noise within the channel, e.g. by using better insulation for the channel, or one could simply use a more powerful signal. An indirect solution is for the receiver to simply request a message from the sender that it has not understood correctly. This technique is known as Automatic Repeat ReQuest (ARQ), and is widely used within the Internet for lost or incorrectly received packets.
Watching human conversation one can observe that with a low number of errors a message is still understood correctly. This is possible because human language contains lots of redundancy, enabling the recipient to reconstruct the original message. This leads to a different kind of solution for achieving reliable communication: the use of forward error-correcting codes.
Forward Error Correction
Generally spoken, Forward Error Correction (FEC) means the addition of redundancy to information in a way that errors after transmission can be detected or corrected . Trivial error-correcting codes are the simple parity check, adding a single redundant bit as the sum of all data bits modulo 2, and the repetition code, which repeats every character multiple times. The simple parity check can detect single bit errors within a block, while the repetition code of length 2t+1 can correct up to t errors. For this, the character encountered the most is regarded as the one most probable. However, with increasing error-correcting capability messages using the repetition code get very large. The code rate R describes the ratio between the number of original bits (data bits) and the number of total bits.
code rate R = k/n
k...number of data bits (= data length)
The assertion that for the repetition code the character most encountered is the most probably is based on the assumption that errors occur independently of the character with equal probability and independently of each other. Using such idealized channel model , if there is only two different kinds of characters then this is called a Binary Symmetric Channel (BSC). Because errors occur independently of their position, this model is also often referred to as a discrete memoryless channel.
Another important channel model is the Binary Erasure Channel (BEC). Within this model, the positions of the errors are known. As such, errors are usually called erasures instead. If such error gets detected, it can be corrected right away, so the BEC offers better error correction capability than the BSC.
Noisy Channel Coding Theorem
In the beginnings of electronic data communication when analog techniques were still dominant it was assumed that the only way to achieve arbitrarily low transmission error rates was to use either sufficiently high signal power or sufficient amount of redundant information. In 1948 Claude Shannon, then employed as a researcher at infamous Bell Labs, surprised the experts with his publication entitled "A Mathematical Theory of Communication", which showed that under certain circumstances and with growing block length of a FEC block the probability of error tends towards 0 [12,5,6,20].
First let the capacity C of a channel be defined as follows:
C = W · log2(1 + S/N) bits/sec.
C is dependent of the bandwidth W and the signal-noise ratio S/N, and is directly related to the error probability p of the channel. For the BSC the capacity is given by
CBSC = 1 + p·log2p + (1-p)·log2(1-p),
and for the BEC by
CBEC = 1-p.
The central theorem in Shannon's work, known as the Noisy Channel Coding Theorem, says that there exists FEC codes for which the error probability of a FEC block with growing block length tends towards 0 under the condition that the code rate is lower than the capacity of the channel. Therefore:
∃ FEC codes: R < C: limn→∞P(error) = 0
Reciprocally, if the code rate is larger than the capacity then the error probability tends towards 1:
R > C: limn→∞P(error) = 1
For an arbitrarily low error rate and therefore reliable communication it is therefore not necessary to set the code rate down to a minimum; with suitable codes the code rate can go up to the capacity of the channel, which is also known as the Shannon bound . (Insofar repetition codes are very inefficient.) The proof for Shannon's theorem is based on probability theory and does not show how to construct such codes. The construction of FEC codes that can get as close as possible to the Shanon bound shall now be a main challenge for researchers (see especially ).
Richard Hamming, employed by Bell Labs as well, realized as one of the first what the essential idea for constructing FEC codes was. The Hamming distance d between two vectors X and Y is defined as the number of locations in which the two vectors differ. The bigger the hamming distance between each pair of code words of a FEC block, seen as vectors of characters, the more errors can be detected resp. corrected. A code with a minimal Hamming distance dmin between each pair of code words can detect dmin-1 bit errors and correct ⌊(dmin-1)/2⌋ errors [3,20].
In 1950 Hamming published his paper about 1-bit error-correcting codes which were more efficient than simple repetition codes. The code presented, known as the (7,4)-Hamming code, required only 3 additional check bits for a block with 4 data bits. In comparison, the repetition code would have required 8 additional bits. Hamming codes have minimum Hamming distance dmin = 3.
It is rumoured that Hamming invented this code after numerous failed attempts at running his programs on a mechanical relay computer, which he could only use at the weekends, due to the occurence of parity errors the machine detected: "Damn it, if the machine can detect an error, why can't it locate the position of the error and correct it?"
Maximum Likelihood Decoding
Within decoding respectively error correcting, each received word is assigned the nearest code word in respect to the Hamming distance, which corresponds to a minimization of the error probability. This problem in generally is NP-complete .
Linear Block Codes
Hamming codes are a special case of linear block codes . A block code is a FEC code which encodes blocks of characters instead of single characters. Linear means that every linear combination of valid code words constitute a valid code word as well. Because of this property a linear code can be viewed as a sub-space GF(q)k of a vector space GF(q)n where GF(q) denotes a Galois field of order q, and the set of all code words can be represented by a matrix G of size (k*n). G is called the generator matrix because all code words can be generated using this matrix:
c = d*G = (d0,...,dk-1) * [Ik|P] = (d0,...,dk-1,pk,,...,pn-1)
When the left part of matrix G equals the identity matrix Ik, as displayed in the upper line, then the code is called a systematic code because the first k symbols of the code word resemble exactly the data symbols.
Decoding of a code word works with help of the parity-check matrix H. H is the null space over all code words. A word is a code word if and only if it is enclosed within the null space H:
syndrome s(c) = c*HT = 0
The complexity of linear block codes in generally is O(n2) for the encoding step while decoding is NP-complete because of the problem of maximum likelihood decoding. One solution for the decoding is the use of look-up tables but this is only practical for small block lengths. However, there exist a number of special cases of linear codes for which the decoding problem can be solved in polynomial time. These special cases are the ones used in practice.
Important FEC Codes Until 1960
Within the next years, a number of important error-correcting codes have been invented of which the most important are listed in the table below :
Reed-Solomon codes (RS codes) [3,19,16] have been invented by Reed and Solomon in 1960. RS codes constitute a special case of linear block codes. An exact classification is given by the following subset representation:
RS codes ⊂ BCH codes ⊂ cyclic codes ⊂ linear block codes
RS codes are indisputably the economically most important FEC codes because they have a number of outstanding properties. They are so-called maximum distance separable (MDS) codes, i.e. they have a maximally possible "minimum Hamming distance" dmin = n-k+1 between each pair of code words, and therefore can detect up to n-k and correct up to ⌊(n-k)/2⌋ errors. Within the BEC model a FEC block can thus be reconstructed after receiving any k symbols which of course is an optimal property! For this, the efficiency of a FEC codes shall be defined as:
efficiency = (# data symbols) / (# symbols required for decoding).
RS codes therefore have an efficiency of 1.
In the encoding step the redundant data symbols get computed using (Lagrange) interpolation over the given k data symbols. For this, the polynomial
P(x) = c0+c1x+...+ck-1xk-1
shall be chosen in a way that P(0) ≡ d0, ..., P(k-1) ≡ dk-1, where d0 to dk-1 denote the original k data symbols. Then the encoded symbols are given by P(0), ..., P(n-1). RS codes are therefore systematic codes.
Decoding within the BEC model works analogously to encoding by interpolating among k (correctly) received symbols to reconstruct the missing ones. Error correction within the BSC is substantially more complicated, and comprises computing the syndrome, locating the error positions, and finding the correct symbols which essentially requires solving a system of linear equations. It is interesting to note that the algorithms required have been inventended only much later (Berlekamp-Massey, 1968, and Chien search, 1964).
The algorithms used in practice have an encoding complexity of
O(n log n) and a decoding complexity of O(n2). (Using
FFT techniques one could get a decoding complexity of O(n log2n log log n)
- however with quite large constants hidden in the O notation. )
Because of the quadratic complexity and the costly computations (GF arithmetics) the maximum practical block length
Because of their excellent properties RS codes are used ubiquitary, amongs others within:
The FEC codes developed by Gallager in 1963 as part of his dissertation at the MIT consitute the beginnings of the history of origins of Tornado codes [4,7,13,17]. These Gallager codes are based on bipartite regular graphs. At the left side there are the data as well as parity bits. q of these bits at a time are chosen randomly and connected with one of the nodes on the right side, with the bits selected in a way that their sum modulo 2 equals 0. The node degree is kept constant on each side. Here is a diagram of a Gallager code as a bipartite graph:
In this form Gallager codes are easily representable as linear codes. The parity-check matrix H then forms to
In each row there are q and in every column there are p ones, q being the right and p the left node degree. The total number of ones conforms to the number of edges in the representation as a bipartite graph. Because the number of nodes is rather low in general the parity-check matrix is sparse. Because of this, these codes are also referred to as Low-Density Parity-Check (LDPC) codes. For solving a system of linear equations with sparse matrices there exist specialized (faster) algorithms, which has probably been one of the intentions Gallager designed these codes that way. However, encoding is complicated because computing the generator matrix requires solving a system of linear equations with dense matrices, so a complete matrix multiplication must be carried out. This is extremely time consuming.
Gallager codes are no optimal codes. Especially, by connecting nodes randomly there arises a variable efficiency that converges with the block length going to infinity. Thus, Gallager codes can only start to be useful for large block lengths.
Because of the large amounts of data, the complicated computations, and the invention of Reed-Solomon codes at about the same time, Gallager codes got forgotten for about 30 years after which they were re-discovered by MacKay and Neal in 1995, and since have been under constant further developments. The final breakthrough was reached in 1997 by Luby, Mitzenmacher, Shokrollahi, Spielman, and Stemann within their paper "Practical Loss-Resilient Codes" . From this work the so-called Tornado-Codes emerge.
Tornado codes  are, just like Gallager codes, systematic FEC codes for large block lengths. The underlying channel model of these codes is the Binary Erasure Channel (errors are thus indepedently distributed). In difference to Gallager codes their construction is not based on bipartite regular graphs but on bipartite irregular graphs whose structure is extremely important.
Tornado codes can be encoded and decoded in O(n ln\(1/ε)) time, where ε is a deliberately chosen positive constant. The maximum reliable code rate, i.e. the rate for which the FEC block error probability converges towards 0, is then given by 1-p/(1- ε), with p denoting the error probability of the BEC. Using the parameter ε this value can reach arbitrarily close to the code rate of the channel (i.e. the Shannon bound). Therefore, Tornado codes can be considered "nearly" optimal codes (in other words, "nearly" MDS codes) with an efficiency of about 1/(1+ ε).
Because of the linear complexity and the comparably simple computation Tornado codes are very fast, and also the principal algorithms for the encoder and decoder are simple.
Construction of Tornado Codes
For the construction of Tornado codes the parity bits of the Gallager codes get moved to the right side of the bipartite graph and considered as check bits. On the left side there is n data bits, and the right side is viewed as containing β n check bits ( β<1).
For the encoding the check bits get constructed by computing the sum (XOR) of all adjacent data bits.
For decoding, a missing data bit can be reconstructed if the value of the check bit and all but one adjacent data bits are known, by computing the sum (XOR) of the check bit with all known data bits.
The aim of the whole decoding procedure is that repeated decoding steps eventually reconstruct all missing data bits. This should work if at most (1- ε) β n data bits are missing. By removing the "decoded" (used) edges within each decoding step, decoding results in linear complexity in the number of edges just as for the encoding. Nodes gaining a node degree of 0 are treated as removed as well.
Using the principle described so far missing data bits can be reconstructed using the check bits. But how can missing check bits be derived, in a way that they are useful for reconstructing other data bits? The solution is to cascade the graphs so a series of codes C(B0), ..., C(Bm) over the graphs B0, ..., Bm emerge. The first graph B0 contains as before the n data bits at its left side. The check bits of each graph Bi then resemble the new data bits for the following graph Bi+1. Therefore, every graph Bi contains βin left and βi+1n right nodes. This is repeated recursively until βi+1n≤√n. For these last check bits an ordinary erasure-correcting code C with code rate 1- β can be used that can reconstruct all missing data for a loss up to β percent with high probability.
Under the condition that C can encode and decode in O(n2) time the overall complexity is linear with
O((√n)2) = O(n).
A possible candidate for the code C is the Reed-Solomon erasure-correcting code.
The whole code (B0, ..., Bm, C) is therefore a code with n data bits and
∑i=1m+1 βin+βm+2n/(1-β) = n (β / (1-β))
check bits, and thus has a code rate of 1-β. If every sub-code can be decoded with high probability at a maximum loss of β(1-ε) percent of the data then the whole code can be decoded with high probability and a maximum loss of β(1-ε) percent of the data. The goal is now finding suitable sub-graphs.
Analysis of the Decoding Procedure
As described within the last chapter the decoding procedure can only continue when there is a check bit on the right side of a sub-graph for which exactly one adjacent data bit is missing, i.e. there is a right node with degree 1. In this case the missing data bit gets reconstructed (replaced by the check bit), all edges connected to the left node get removed (thereby always computing the XOR of the reconstructed data bit with the adjacent check bit), and finally the left and right node get removed. This decoding step gets repeated as long as there are right nodes with degree 1. The whole decoding procedure is successful when all nodes on the left side got removed.
We want to show now that under the condition of a maximum loss of β(1-ε) percent, and as long as there exist left nodes, there is always a right node in the graph with degree 1. Because this is based on probability theory, only the asymptotic behavior can be analyzed.
For this we define an edge with left (right) degree i as an edge that is connected to a left (right) node with degree i. Let (λ1,..., λm) be the vector that holds the procentual proportion of edges with left degree 1...m. Analogously, let (ρ1,..., ρm) be the vector holding the procentual proportion of edges with right degree 1...m.
Now, within every decoding step 1 out of E edges gets removed:
Δt := 1/E = 1 time unit = 1 decoding step, t = 0...1
Let li(t) be the proportion of edges with left degree i at time t. Analogously, let ri(t) be the proportion of right edges with degree i at time t. The proportion of all edges e(t) at time t results in
e(t) := ∑li(t) = ∑ri(t)
Further, at every decoding step a right node with degree 1 gets chosen, and the adjacent left node and all its edges get removed. The probability that the left node degree is exactly i is li(t)/e(t), and in this case i edges of degree i get removed. This is expressed by the following equation:
Li(t+Δt) - Li(t) = - i · li(t)/e(t),
where Li(t) denotes the number of edges with left degree i, and therefore Li(t) = li(t) · E = li(t) / Δt. Because only the asymptotic behavior can be analyzed, we let the number of edges go to infinity: E→∞ and therefore Δt→0.
limΔt→0 Li(t+Δt) - Li(t) = limΔt→0 (li(t+Δt) - li(t)) / Δt = (dli(t) / dt = - i · li(t)/e(t)
This differential equation can be solved more easily when we substitute dt/e(t) by dx/x. With the initial condition li(t=0) = δ λi, where δ denotes the number of missing (resp. remaining) left nodes, the proportion of left nodes with degree i
results in li(x) = δ λixi, x∈[1,0)
Solving the differential equations for ri(x) resp. r1(x) is more complicated. The result for r1(x) is expressed using the so-called "degree sequence functions" (polynomials) λ(x) = δ λixi-1 and ρ(x) = δ ρixi-1:
r1(x) = δ λ(x) (x - 1 + ρ(1 - δ λ(x))).
This proportion of right edges with degree 1 shall now be always bigger than 0, for all x ∈ [1,0). The inequality solved now leads to the central condition for a successful decoding procedure:
ρ(1 - δ λ(x)) > 1 - x, ∀ x ∈ [1,0).
Considering that these calculations describe the asymptotic behavior, the following is concluded:
Contrary – and maybe more impressive – the following can be said:
Because this inequation describes exactly only the asymptotic behavior, in practice there is mostly a few left nodes remaining. However, if there is at most η% nodes left, for some η > 0, and is λ1 = λ2 = 0, then the decoding procedure terminates successfully (see ).
Regular vs. irregular Graphs
If on the graph's left side there are n nodes of degree d, and on the right side there are β n nodes then these have (necessarily) degree d/β. The "degree sequence functions" then result in λ(x) = xd-1 and ρ(x) = xd/ β-1. Now if d ≥ 3 then the central condition is only fulfilled when
δ ≤ 4 · (1 - (1/(d-1))1/(d/β-1)).
It is easy to see that the peek acceptable error rate δ for the decoding to be successful is achieved when d = 3. As an example let β = 0.5 so the highest acceptable error rate is at most 0.43%; then to reconstruct a block at least 1.14 times the amount of original data must be received - a value far from the optimum 1.
Let the distribution of the left degrees ("left degree sequence") be given by the following "truncated heavy tail" distribution with parameter d:
λi = 1 / (H(d)(i-1)), i = 2...d+1
where H(d) := ∑i=1d 1/i, i.e. the harmonic series till 1/d. The average left node degree then results in al = H(d)(d+1)/d ~ ln(d).
Let the right degrees ("right degree sequence") be distributed after Poisson (the power series of the distribution can be approximately viewed as a polynomial):
ρi = (e-α αi-1) / (i-1)!, i ≥ 1.
The average right node degree then results in ar = α eα / (eα-1), with α chosen such that ar = al/β.
With the specified edge degree distributions given above the central condition is fulfilled exactly when δ ≤ β / (1+1/d). The peak acceptable loss ratio is hence determined by the parameter d. The higher d is selected the nearer the code comes to the Shannon bound; we therefore have an asymptotically optimal code! Which d is chosen must be weighed against the en-/decoding complexity as d also defines the number of edges by n·al ~ n·ln(d). The complexity for en- and decoding therefore results in O(n ln(d)).
As noted in the last chapter, successful decoding demands λ2 = 0, which is not the case here. In  the following remedy is given: First the graph as just described is constructed over the n left and (β-γ)n right nodes. Another graph is then layered over the n left and γn right nodes, with the left nodes having degree 3 and their edges being randomly connected with the γn right nodes. The parameter γ is to be determined.
With β = 1-R and ε := 1/d a so constructed code with any code rate R and some ε > 0 can correct all errors with very high probability within a large block, provided that at most (1-R)(1-ε) percent of the data is missing, and it can do this in time proportional to n ln(1/ε).
Efficiency of Irregular Graphs
Efficiency compared to the average node degree (taken from ):
Results from "Practical Loss-Resilient Codes":
Comparison between Tornado and Reed-Solomon Codes
The following tables shall give a short comparison for the most important properties of Tornado and Reed-Solomon codes.
Running times measured on a Sun 167 MHz UltraSPARC 1 with 64 MB RAM and Solaris 2.5.1 :
Own Implementation of Tornado Codes
Header - Functions
The functions presented in the following constitute the interface for the own implementation of Tornado codes, and are directly taken from the header file of the source code.
Header - Data Structures
These data structures are taken from the header file as well, but are only intended to be used class internal (private). They shall depict how nodes and edges of the graph are represented internally. Furthermore they give insight into the memory requirements for the code.
Outline of the Main Functions' Inner Workings
The Encode() function:
The Decode() function:
The constructor – construction of the graph:
Why is the graph constructed this way? The number of nodes resp. edges is a real number, and discretizing this number introduces errors within the distribution of edges on the nodes. With the algorithm described above a better edge distribution is achieved, and conflict situations like still having nodes with degree 0 but no more edges to spend are avoided.
With this algorithm comparably good results have been achieved for data lengths up to ~20000 characters. At a test with a data length of 10000 characters, a code rate of 1/2, and d=10, the average efficiency was about 91%, which pretty conforms to what was expected by the theory.
ARQ vs. FEC
Many networks, just as the Internet, traditionally use Automatic Repeat Request (ARQ) to achieve reliable communication. ARQ is good for transmission over channels with variable capacity but bad or not applicable at all in a number of cases. Many receivers cause huge server and network loads, consuming lots of packet puffer and network bandwidth, and creating congestion. In cases where the return channel is either limited, too slow, or not available at all, like in satellite and deep space communication, ARQ is not applicable. Within networks with huge latency it must also be considered that ARQ results in even more and somtimes too much latency .
A solution for these problems arises in the use of FEC codes respectively hybrid models. Note that only erasure-correcting codes are required because received packets are always correct (their integrity is check by checksums such as CRC), and lost packets equal erasures. The use of FEC instead of ARQ is taken amongst others in Reliable Multicast (RFC 3453) and ALC (RFC 3450-1) [10,4]. Hybrid models basically work in two ways: either data is sent with FEC information, and when a receiver experiences too many errors it uses ARQ, or data is sent without FEC and a client may request check bits in the case of errors using ARQ .
Software Distribution on the Internet
A good application area for tornado codes is software distribution within large networks such as the Internet [9,15,18]. It is characterized by 1 sender but many receivers (millions), and the problem of distribution of large amounts of data. Examples are: distribution of new software (updates), financial information, database replication, audio/video transmission, access to popular web sites.
By default within the Internet point-to-point connections are created. This has the following advantages:
However, the disadvantages are
In contrast, there is multicast and broadcast. They invert the problems of point-to-point connections, turn advantages to disadvantages...:
...and turn disadvantages to advantages:
To circumvent the problems of multicast and broadcast transmissions, traditionally a so-called data carousel is used, that uses FEC instead of ARQ. Doing this, for some k data packets to be sent n-k packets with check bits are constructed. An optimal code requires the recipient to receive any k packets to successfully reconstruct the original data. The packets are then sent cyclically, enabling a receiver to start, interrupt, and resume the download at any time
Traditionally, Reed-Solomon is used as a FEC code. Because of this code's complexity it can only be applied over a small number of packets, however (≤255). Therefore, the object (file) is partitioned into a number of blocks. Additionally, some interleaving of the blocks is used to better protect against burst errors. The problem at the receiver side now is to receive packets exactly for those blocks it cannot yet decode. This problem is known as the "coupon collector's problem".
Using tornado codes, a FEC can be constructed over the entire object, and the receiver suffices to receive any k(1+ε) packets for the original data to reconstruct. In doing so, all data can be encoded and decoded in linear time (using a fixed ε). It must be pointed out that the Internet is noBEC as required for Tornado codes but rather a burst-erasure channel. However, we can elegantly circumvent this problem by sending the packets in random order .
A data carousel using tornado codes essentially combines the advantages of point-to-point and multicast connections, summarized here:
Note that despite there is no more need for packet puffers and retransmission timers tornado codes have high memory demands because the FEC is to be computed over the whole object .
Commercialization of Tornado Codes
Tornado codes have already attained interesting further developments. For example, there is research on Tornado and LDPC codes for the BSC. A very interesting further development is the invention of so-called rate-less codes such as the LT codes (Luby-Transform codes) [12,10] presented by Luby in 1998 and patented by Digital Fountain, as well as the Online codes  published by Maymounkov in 2002. Rateless codes do not have a fixed code rate in its actual sense as they can create an arbitrary number of different FEC blocks, which can all be used for the decoding of an object.
Text and images may be taken if credit is given such as including a public reference to this web site.