LZW COMPRESSION AND DECOMPRESSION
 
                 Copyright (C) 1993 Homer Wilson Smith
       Redistribution rights granted for non commercial purposes.
 
 
     The LZW compression algorithm is probably one of the most mind
boggling algorithms to come along, but it is also one of the most
brilliant, so it is worth understanding well.
 
     It's really quite mind opening.
 
     Ever program something you didn't fully understand, but because it
worked fine you let it go, and then months later found yourself still
wondering why your code worked?
 
     Well that's what happened to me with LZW compression.
 
     My programs worked, for a very long time they worked, until one day
I went to send a large tiff to a color separator, and bang nothing would
read it.
 
     Corel Draw wouldn't read it, CSHOW wouldn't read it, my own damn
tiff program wouldn't read it, and that REALLY pissed me off.
 
     I had written my tiff programs with barely passable understanding
of the algorithm, following along the pseudo code provided by others,
with little personal comprehension myself.  I had read a number of
descriptions trying to make LZW compression and decompression clear to
me, but basically all I ended up with was a partial understanding of the
encoding process and no understanding at all of the decoding process.  I
kind of programmed it blind, and hoped it would work.
 
     It took me many hours to find the bug, it was one of those 'it
crops up every hundred million years' kind of bug, and I didn't know
what I was doing anyhow, so it wasn't easy.
 
     But from all bad things comes something good (I think) and this
paper is the result.  I finally understood both sides of the compression
and decompression dichotomy, and I am reasonably sure my programs will
work because I understand them and not because I followed someone else's
code.
 
     The proof of course is in the pudding, and the pudding is whether
or not I can explain this to others so that they get it the first time.
The algorithm is really quite brilliant, and worth knowing just for the
fun of it.
 
     This paper is not meant to replace everything else you might read
on the subject of LZW encoding or TIFF and GIF files, but it is intended
to make all that other stuff clear.  This is what I wish someone had
written for me when I first approached the subject.
 
                          WHY DATA COMPRESSION
 
     The problem with data is it takes up so much damn space, and with
the cost of hard disk space and the slowness of telecommunications, it
behooves us to find a way to make data smaller, to make it take up less
room, so that we can then use up the space we free up to fill it with
more data!
 
     It actually takes less time to compress data than it does to
transmit it, so its a good cost saving to compress the data first, then
transmit it and uncompress it at the other end.
 
                          RUN LENGTH ENCODING
 
     Various encoding schemes have been devised to take care of this
need, one of the earliest most obvious ones being RUN LENGTH ENCODING.
Run length encoding takes advantage of the fact that often many pixels
right next to each other are the same pixel, so rather than sending them
all one by one they are counted up into a RUN and two bytes are sent
instead, the first byte telling how many pixels there were of one
number, and the next byte the number it was.
 
     Thus a run of pixels that looked like this
 
     5 5 5 3 3 3 9 9 9 9 9 1 1 1 1
 
     would be run length encoded as
 
     3 5 3 3 5 9 4 1
 
an obvious saving of space.  The longer the runs are the more space is
saved as each color run is represented by only two bytes, a count and a
color.  If your counts can go only up to 255, then if the run is longer
you have to start a new run, but if your counts can go up to 32767, then
you can handle longer runs before you have to start a new run.
 
     The problem comes in when the data does not consist of large
expanses of identical pixels, who wants to look at a single patch of red
anyhow.  A worst case scenario of course is when every pixel is
different such as,
 
     2 3 4 5 6 7 8 9   etc.
 
     Encoding this as RLE would produce,
 
     1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
 
which is an obvious waste of space.
 
     One solution to this is to insert an INTRODUCER byte before each
sequence telling whether it is a horizontal run, or a stretch of one
byte per pixel.  Thus runs may start with positive counts, and stretches
of one byte per pixel may start with a negative count.
 
     Thus the sequence
 
     3 3 3 3 3 1 2 3 4 5 6
 
would be compressed as
 
     5 3 -6 1 2 3 4 5 6
 
giving some compression, but its obviously not an ideal solution.
 
     There is also tremendous complexity involved in programming the
ideal compression for data that has a lot of 2 byte runs in it.
 
     For example,
 
     3 3 4 5 3 3 4 5 3 3 4 5
 
is better put out as a single stretch of one byte per pixel
 
    -12 3 3 4 5 3 3 4 5 3 3 4 5
 
rather than as a mixture
 
     2 3 -2 4 5 2 3 -2 4 5 2 3 -2 4 5
 
because of the the one byte overhead in switching from horizontal runs
to one byte per pixel.
 
     One might make a simple rule that all two byte runs should be put
out as one byte per pixel, except that this is only true if you have
already opened a one byte per pixel stretch, otherwise 2 byte runs
should be put out as a horizontal run (count,byte).
 
     In any case run length encoding (RLE) only produces good compressio
when the data is even and unchanging.  Fractals of course are the
antithesis of unchanging data, so RLE does not work well on fractals or
any other natural data like video frames or even written work.
 
     In other words if you have a patch of red to compress use RLE.
Otherwise keep reading.
 
                           The LZW Algorithm
 
     There are four concepts that need to be understood.  They are,
 
     1.) INPUT CODE
     2.) TABLE INDEX
     3.) TABLE CODE
     4.) OUTPUT CODE
 
     The purpose of LZW compression is to turn the input code stream
into an output code stream of lesser length, and then later to
reconstruct the input code stream from the output code stream.  This
accomplishes the purpose of compression and decompression.
 
     The above four concepts can be visualized as follows:
 
     INPUT CODE             TABLE                 OUTPUT CODE
     STREAM              INDEX  CODE              STREAM
------------------------------------------------------------------------
CODE, CODE, CODE...     (INDEX, CODE)             CODE, CODE....
                        (INDEX, CODE)
                        (INDEX, CODE)
                              .
                              .
 
     CODE.  A code is a byte wide entity consisting of 8 bits and
ranging in decimal value from 0 to 255.  They can be considered as
characters, or as integer numbers, it doesn't matter.  They are NOT
signed integers however.  INPUT CODES, TABLE CODES and OUTPUT CODES are
all of this format.
 
     INDEX.  An index is a two byte wide SIGNED entity consisting of 16
bits and ranging in decimal value from -32768 to 32767.  It is called an
index because it refers to an earlier entry in the table.  For example,
if table entry number 400 consists of the (index,code) pair (312,5),
then 312 is called a table index and points BACKWARDS to table entry
312, and 5 is a table code generated in the process of compression.
Table entry 312 will itself have an (index,code) pair that will point to
an even earlier table entry, which will point to an earlier table entry
ad infinitum until you come back to the first table entries before which
there are none.  These first entries in the table are called the ROOT
AREA.
 
     If a table entry has -1 for an INDEX, that means that that table
entry does NOT refer to any earlier table entry.  -1's appear as table
indexes either
 
     1.> when a table entry is still unused or
 
     2.) when it resides in the ROOT AREA as described below.
 
     Table entries in the root area do not point to earlier table
entries because there are no earlier entries.
 
     Table entries that haven't been used yet don't point to earlier
table entries because they haven't been used yet!
 
     In either case the table index will be -1.
 
     0 by the way is a valid table index which points to the very first
table entry in the root area, table entry number 0.
 
     INPUT CODE STREAM.  The input code stream is the sequence of bytes
that you wish to compress into a hopefully shorter output code stream.
 
     OUTPUT CODE STREAM.  The output code stream is the sequence of
bytes that is the final product of compression, and is hopefully shorter
than the input code stream and RECONSTRUCTIBLE BACK into the input code
stream.  It's no good compressing something if you can't uncompress it!
 
     TABLE.  The table is a two column table of integer numbers of the
following form:
 
               TABLE
             POSITION  (INDEX,CODE)
------------------------------------------------------------------------
         -----    0    (-1   , 000)
           |      1    (-1   , 001)
           |      2    (-1   , 002)
           |      .       .
    ROOT AREA     .       .
           |    255    (-1   , 254)
           |    255    (-1   , 255)
           |    256    (-1   , 256)  Clear Code (CC)
         -----  257    (-1   , 257)  End of Information Code (EOI)
         -----  258    (INDEX,CODE)
           |    259    (INDEX,CODE)
        ADDED             .
    ENTRY AREA            .
           |    278    (INDEX,CODE)
           |    279    (INDEX,CODE)  <---TABLE POINTER to latest entry
           |    270    (-1   ,   0)  As yet unused entries
           |    271    (-1   ,   0)          "
           |     .           .
        -----  4095    (-1   ,   0)  End of table.
 
     The table consists of only the two columns of numbers, the table
position numbers are NOT in the table and are only shown for clarity.
 
     The table consists of 2 areas, the ROOT area and the ADDED ENTRY
AREA.
 
     ROOT AREA.  The root area consists of the first 258 entries in the
table numbered from 0 to 257.  The entries from 0 to 255 are initialized
at program start up to the numbers shown and do not change during the
duration of the program.  Entries number 256 and 257 are also set during
program start up and are used to control the process of compression and
decompression.  Their CODE values are otherwise arbitrarily chosen.
 
     Because their values are greater than 255, table codes must use 16
bit wide integers, even though codes have only 8 significant bits.
 
     The table indexes in the root area are set to -1's because they do
not point to any earlier table entry.  The table codes in the root area
are set to the basic alphabet of possible characters in the input code
stream, in this case the integers from 0 to 255.  The Codes for
CLEARCODE and END OF INFORMATION CODE are arbitrarily set to 256 and
257.
 
     This leaves table position 258 as the first empty position to ADD
new entries into the table when the compression process actually starts.
 
     ADDED ENTRY AREA.  At program start up positions 0 through 257 are
initialized as described above and a pointer is set to the last entered
entry in the table, in this case 257.  The remaining table entries are
initialized to (-1,0) as shown and usually there are a total of 4096
entries in the table numbered 0 to 4095, including the root area.
 
     During the process of compression the program scans the input code
stream and adds new entries to the table according to the LZW
compression algorithm which we will discuss at length below.  Each new
table entry consists of an (INDEX,CODE) pair which is determined by the
input code stream and the LZW algorithm.  As each new (INDEX,CODE) pair
is found the table pointer is incremented by by one and the new
(INDEX,CODE) pair is inserted into the table at the position pointed to
by the table pointer.
 
     In the following discussion we wish to prove the following points.
 
     1.) The table is constructed from and totally determined by the
input code stream.  The table is usually shorter than the input code
stream and so compression has taken place in the construction of the
table.  This is because each table entry can represent more than one
character in the input code stream.
 
     2.) From the table alone the input code stream can be
reconstructed.  Thus if you have the complete table you can reconstruct
the input code stream.
 
     3.) The right hand column of table codes can be reconstructed from
the left hand column of table indexes alone.  Thus if you know the left
hand table indexes you can recompute from that information alone the
right hand table codes and thus the input code stream.
 
     4.) Thus the table indexes can act as the final output compressed
data stream, or in other words, the output codes ARE the table indexes
from the added entry area.
 
     Do not get table codes confused with the output codes.  The table
INDEXES become the output codes.
 
     Because the table indexes can go up to 4095, they need 12 bits to
represent them.  Although they are computed as 16 bit wide integers,
when they are finally output to the output data stream, they are packed
at 12 bits each.
 
     Thus an input data stream of 8 bit integers gets compressed and
output as an output data stream of 12 bit integers.
 
     Since each table index can represent a unique string of varying
length, overall compression takes place.
 
     For example if a single 12 bit wide table index represents the 3
character string of 'ABC' or 24 bits, that is a compression factor of
2:1.  Three characters at 8 bits each are compressed into one character
of 12 bits.
 
     5.) The process of uncompressing the output code stream then works
as follows.
 
     First you must rebuild the table indexes from the output code
stream.  The ROOT AREA of the table is trivial as it is always the same.
 
     The ADDED ENTRY AREA is almost as trivial.  The table index column
of the added entry area IS the sequence of output codes, as the output
codes ARE the table indexes.  The table code column of the added entry
area is then recomputed from the table index column by means that will
become obvious when the full algorithm is discussed, and that completes
the rebuilding of the table from the output codes.  From the completed
table one can then reconstruct the original input code stream.
 
                         So how does LZW work?
 
     The idea behind LZW is that not only do some pixels repeat
themselves, but many SEQUENCES of pixels repeat themselves.
 
     For example take the input code stream (in bytes),
 
     ABCDEABCDEABCDEABCDEABCDE
 
     Such a sequence would not lend itself well to run length encoding
because no single pixel forms a run.  However the sequence of 5 pixels
ABCDE certainly form a run as they appear one right after the other 5
times.
 
     One might even suggest a super run length encoding that could seek
out repeating runs of SEQUENCES and perhaps encode a string like
 
     ABCDEABCDEABCDEABCDEFGHFGHFGHFGHFGH
 
stream as something like,
 
     5(ABCDE) 4(FGH)
 
     However LZW goes one step further and recognizes that the string
ABCDE can appear anywhere in the input code stream, not just in
consecutive runs, so why not just assign the string ABCDE a unique
number, say 234, and then every time that string appears replace it with
the single byte 234.
 
     So if 234 is assigned to ABCDE and 235 is assigned to FGH then the
above 20 byte string could be compressed into the 9 bytes
 
     234 234 234 234 234 235 235 235 235
 
     Of course the number 236 could be assigned to the string ABCDEABCDE
and 237 assigned to FGHFGHFGH, so now the same 20 byte input code stream
could be output as
 
     236 236 234 237 235
 
     Obviously you stand to create a lot of compression this way.
 
     Just as obviously the whole scheme is utterly preposterous, as you
could just assign one number, say 234, to the whole damn text and then
output only that one number!
 
     The problem is clearly keeping track of what string each number
stands for.  That is where a string table comes in.
 
     STRING TABLES
 
     A string table is merely a table of strings of varying length and
the code numbers that have been assigned to them.  In this case the
string is inserted into the table at the table entry position
corresponding to the code number assigned to that string so that the
code numbers do not also have to stored.
 
     For example let's say we have collected the following table of
strings.
 
      TABLE      TABLE
      POSITION   STRING ENTRY
------------------------------------------------------------------------
      232        SDF
      233        JHIKLNN
      234        ABCDE
      235        FGH
      236        ABCDEABCDE
      237        FGHFGHFGH
 
      With such a table in hand it would be easy enough to covert the
output code stream,
 
     236 236 234 237 235
 
back into the original input stream of
 
     ABCDEABCDE ABCDEABCDE ABCDE FGHFGHFGH FGH
 
     (Without the spaces of course.)
 
     Which is what we want.
 
     The problem of course with this method is that you have to have the
table in order to know what the code numbers stand for, and the space
you save by compressing your input code stream into the smaller sequence
of bytes is mostly lost in having to carry the whole table along with
you.
 
     This becomes obvious in the extreme example of assigning the whole
text one number and putting the whole text in the table as one entry.
You are actually up a byte from the original data, namely the damn code
number.  Not a good compression ratio.
 
     The key question is, can the table be reconstructed FROM the output
code stream?  If not then you have to carry the table with you to decode
the output code stream, and that wastes space.
 
     If you CAN reconstruct the table from the output code stream then
you can throw the tables away as you produce the output, and recover
them later when you want to go in the other direction.
 
     LZW takes this one step further in that the output code stream IS
the table, at least the left hand column of table indexes, and the right
hand column of table codes can be easily computed from the table
indexes.  Once you have both you can recompute the input code stream.
 
     So how does it work?
 
     THE LZW ALGORITHM
 
     The algorithm depends on the idea that any given input string
either is or is not in the table.  If it is in the table it is not put
in again.  If the string is not in the tale it is immediately put in.
 
     But let's look at how the algorithm gets its strings.
 
     Assume we want to compress the input code stream
 
     ABABABAB
 
     Remember the characters A and B are bytes to the computer and so
are really integer numbers somewhere between 0 and 255.  In this case A
is 61 and B is 62.
 
    The algorithm starts with the first character in the input code
stream and continues to grab characters from the input stream building a
string until it builds a string that is NOT in the table.
 
     The string table is first initialized with all possible single
character strings, this forms the root area of the table.  Let's
simplify this and just assume for the moment that there are only 4
different single character strings rather than 255, and let's call them
A, B, C, D.  Let's also forget the clear code and the end of information
code, so that our root area has just 4 entries in it, numbered 0 through
3.  and the first added entry will go in position 4.
 
     Then the table will look like this after initialization.
 
   CODE STRING
     0    A
     1    B
     2    C
     3    D
 
    Since ALL possible single character strings ARE in the table in the
root area the very first character that is taken from the input code
stream will definitely be in the table.  In this case that is the letter
A.
 
     So nothing is added to the table and the next character is grabbed.
This creates the string AB.  AB is NOT in the table so we add it to the
table at position 4.
 
     0   A
     1   B
     2   C
     3   D
     4   AB
 
     We now do something very strange, something that will probably earn
LZ&W their Nobel Prize in mathematics: we discard everything in the
string but the last letter of the string and start anew.
 
     So now we have string B which is definitely in the table in the
root area, so we go on and grab the next character in the output code
stream which is another A.  The string BA is NOT in the table so we add
it to position 5.
 
     0   A
     1   B
     2   C
     3   D
     4   AB
     5   BA
 
     We discard all but the last letter of the string leaving us with A
again which is in the table in the root area.  We grab the next
character in the input code stream forming AB.  Now this time AB is
already in the table from the last time we came across it, so we
continue on to the next character.  This gives us ABA.  ABA is NOT in
the table so we enter it in position 6.
 
     0   A
     1   B
     2   C
     3   D
     4   AB
     5   BA
     6   ABA
 
    Keeping the last letter A, we grab the next character in the input
code stream forming AB.  This is in the table, so we get the next
character forming ABA which is ALSO now in the table.  Adding the next
character we get ABAB which is NOT in the table so we add it at position
7.
     0   A
     1   B
     2   C
     3   D
     4   AB
     5   BA
     6   ABA
     7   ABAB
 
     At this point you can clearly see how to build the rest of the
table from the input code stream.  You do this until you run out of
input code stream or table room at which point you clear the table and
start over again with a new table.
 
     Let's assume we ran out of input codes, so as usual we discard all
but the last character of the last string and that leaves B.  Since
there are no further codes to add to B, it becomes the last string.
 
     0   A
     1   B
     2   C
     3   D
     4   AB
     5   BA
     6   ABA
     7   ABAB
     8   B
 
     And there we have our completed table.
 
     Now here is where the brilliance comes in.  Each string is added to
the table because it was not already in the table.  But each string is
built up one character at a time so each one of its predecessor strings
HAD TO BE IN THE TABLE for that string to get as long as it did before
it got put in the table.
 
     In other words its that LAST CHARACTER at the end of each string in
the table that made that string NOT BE in the table.  This means that if
you take any string in the table and remove its last character you must
have a string that was already in the table earlier.
 
     Thus each string can be split up into its last character and an
index into an earlier entry in the table.  And voila you have the
(INDEX,CODE) form of the string table.
 
   TABLE     TABLE IN        TABLE IN
  POSITION   STRING FORMAT   INDEX,CODE FORMAT
     0       A                 -1   A
     1       B                 -1   B     ROOT AREA
     2       C                 -1   C
     3       D                 -1   D
     4       AB                 0   B         |
     5       BA                 1   A     ADDED ENTRY
     6       ABA                4   A         |
     7       ABAB               6   B
     8       B                  1   x     x means doesn't matter.
 
     So the first thing we need to demonstrate is that the input code
stream can be reconstructed from the table, particularly from the table
indexes in the added entry area, 0 1 4 6 1.
 
     The first thing we must realize is that each table index refers to
an earlier complete string already in the table.
 
     For example in position 7 there is a (6,B).  The 6 refers to
position 6 where there is a (4,A).  The 4 refers to position 4 where
there is a (0,B) and the 0 refers to position 0 where there is a (-1,A)
The -1 means you are in the root area and the backwards search ends.
 
     So the (6,B) in position 7 means that position 7 contains a string
which is a B preceded by the string in position 6.  But the string in
position 6 is an A preceded by the string in position 4.  The string in
position 4 is a B preceded by the string in position 0.  And the string
in position 0 is an A preceded by nothing.
 
     So what string does table index 6 refer to?  This is the way you
figure it out.
 
     The | means refers to.  This says that index 6      (6,B)   pos 7
refers to an A preceded by what's in position 4,          |
what's in position 4 is a B preceded by what's in      (4,A)     pos 6
position 0, and what's in position 0 is an A            |
preceded by nothing 'cuz its root.    This gives     (0,B)       pos 4
us the string ABA for index 6.  Looking in the        |
string form of the table we can see that is       (-1,A)         pos 0
right.
 
     The entire string in position 7 is (6,B) or ABAB. The index 6 alone
refers to a prior string of ABA, specifically the string in POSITION 6,
that's why its called index 6, because it refers backwards to position
6.
 
     In the same way you can compute the string values for each of the
indexes in the output code stream 0 1 4 6 1.
 
   Table      Table
   Position   Index
     4        INDEX 0 = A
     5        INDEX 1 = B
     6        INDEX 4 = AB
     7        INDEX 6 = ABA
     8        INDEX 1 = B
 
     Thus 0 1 4 6 1 = A B AB ABA B which is what we started with, minus
the spaces which I added for clarity.
 
            GETTING THE TABLE CODES FROM THE TABLE INDEXES.
 
     So what happens if you don't have the table codes, but only the
table indexes?  Well you remember that when a string was built that was
NOT in the table, it was then put in the table at the next available
position.  Then all characters before the last character were discarded
and the last character was kept as the first character of the next
string to be built.
 
     More accurately once a string is found that is NOT in the table,
say ABAB, this is represented by an (index,code) pair such as (6,B).
The part of the string corresponding to the index is then thrown away
and the remaining single table code becomes the first character of the
next string.
 
     Once a string is built starting with that first character it too is
added to the table and the last character only is kept to become the
first character of the next table entry.
 
     Therefore it is obvious that the last character of each table entry
is the first character of the next table entry!  Thus the table is
always of the form,
 
   TABLE     TABLE IN
  POSITION   STRING FORMAT
     0       A
     1       B     (ROOT AREA)
     2       C
     3       D
     4       A...D            I have used random characters here for
     5       D...C            demonstration purposes only.  The last
     6       C...B            character of the string in position 4 is
     7       B...D            the first character of the string in
     8       D                position 5, etc.  The ... means an
                              arbitrary number of 0 or more
                              intervening characters.
 
     So how does this help us find the codes from the indexes alone?  It
should be clear from an earlier discussion that from the indexes AND
codes the original string represented by any (index,code) pair can be
reconstructed by following the indexes back to root and accumulating the
codes of each touched upon entry along the way.  But if you don't have
the codes how can you accumulate them?!
 
     Well the magic is although you may have no codes to accumulate, you
always have the FIRST code of any string in the table because the first
code of a string is in the root area whose codes are ALWAYS known.  Thus
from the indexes alone you can always find the FIRST character of any
string represented by any index.
 
     Here's our complete table again without the codes.  Remember root
codes are always known.
 
   TABLE     TABLE IN        TABLE IN
  POSITION   STRING FORMAT   INDEX,CODE FORMAT
     0       A                 -1   A
     1       B                 -1   B     ROOT AREA
     2       C                 -1   C
     3       D                 -1   D
     4       AB                 0   ?         |
     5       BA                 1   ?     ADDED ENTRY
     6       ABA                4   ?         |
     7       ABAB               6   ?
     8       B                  1   x     x means doesn't matter.
 
     For example, in table position 7 you have (6,x).
 
     Index 6 refers to position 6 which contains (4,?), which refers to
position 4 which contains (0,?), which refers to position 0 which
contains (-1,A).
 
     So the string in position 7 MUST start with an A!
 
     But this first character is the LAST character of the immediate
preceding string in the table, in this case the string in position 6.
 
     By definition the last character of a string is that string's table
CODE.  So the first character of the string in position 7 is the last
character of the string in position 6 and is thus also 6's table CODE.
 
     (Don't get table codes confused with output codes.  A string's
table code is the last character of the string.  A string's output code
is the table INDEX for that string!)
 
     Thus by finding the first character of any index string, you can
find the table code of the immediately previous string.
 
     Going back to our previous example, let's assume we have only the
indexes and no codes, except the root codes which are known.  The
missing codes are represented by ?'s in the below table.
 
   TABLE     TABLE IN        TABLE IN
  POSITION   STRING FORMAT   INDEX,CODE FORMAT
     0       A                 -1   A
     1       B                 -1   B     ROOT AREA
     2       C                 -1   C
     3       D                 -1   D
     4       AB                 0   ?         |
     5       BA                 1   ?     ADDED ENTRY
     6       ABA                4   ?         |
     7       ABAB               6   ?
     8       B                  1   x     x means doesn't matter.
 
     In position 8 the index 1 points directly to position 1 in in the
root area which has a known code of B.  Thus B is the first character of
the string in position 8 and therefore the last character of the string
in position 7.  Thus we know the code for position 7, its a B!
 
     In position 7, the index 6 points to position 6 whose index of 4
points to position 4 whose index of 0 points to position 0 which being
in the root area has a known code of A.  Thus A is the first character
of the string in position 7 and so also the last character of the string
in position 6.  Thus we have the code for 6, its an A.
 
     Visually any index can be followed back to root   (6,?)   pos 7
just like we did before, the only difference being      |
that there are no known codes to accumulate until    (4,?)     pos 6
we get to root where there will be just one known     |
code.  That code is the first character of the     (0,?)       pos 4
index we just traced back and the last character    |
of the index just before it in the table.       (-1,A)         pos 0
 
     Thus by starting with the last added index in the table, and
tracing it back to its first character in the root area, we can find the
table CODE of the immediately prior index.  Repeating the process for
that index, we can then fill in the entire table.
 
     Once the table is filled in completely, we can then start a second
pass and accumulate the complete strings for each index using the codes
we just found, and then concatenate those strings together to form the
original input code stream.
 
     Totally amazing, isn't it?
 
     Homer