Text compression and Tile compression: Difference between pages

From NESdev Wiki
(Difference between pages)
Jump to navigationJump to search
 
(→‎Pokémon LZ: Rangi in gbdev claims this algorithm is in G2, not G1)
 
Line 1: Line 1:
'''Text compression''' refers to techniques that allow fitting more text data into a smaller space.
'''Tile compression''' refers to techniques that allow fitting more graphics data into a smaller space.
Programs using [[CHR ROM vs. CHR RAM|CHR ROM]] cannot use compressed tiles, as their tile data must be stored in the PPU's native format.
But programs using CHR RAM can process tile data while copying it from PRG ROM to CHR RAM, and this processing allows storing more tiles in the same space.


== Dictionary compression and DTE ==
== <span id="RLE">Run-length encoding</span> ==
[[wikipedia:Run-length encoding|Run-length encoding]] transforms runs of identical bytes into a shorter sequence of bytes that specifies the length of the run.


Dictionary compression is a technique where part of the character set is
In NES tile data, byte-level run-length encoding works well when a row of 8 pixels in a tile is identical to the row above it.
reserved to denote references to a "dictionary". If the byte falls into this range,
It also works well for [[nametable]] data because a horizontal run of blank tiles becomes a single tile.
a string is copied from the dictionary rather than the byte being copied verbatim.
As this compression technique does not require knowledge of past data,
it is very easy to implement on machines having little memory like the NES.


Sometimes the compression may be applied recursively, where the dictionary string
Pixel-level run-length encoding is much slower but can achieve impressive results within a tile.
itself may contain references to other dictionary strings.


Dual-tile encoding, or DTE for short, is a special case of dictionary compression. It is also known as [https://en.wikipedia.org/wiki/Byte_pair_encoding byte-pair encoding], or digram coding.
There are several different RLE data formats.
In this case, the dictionary strings are all two bytes long.


Example implementations:
=== PCX ===
The [[wikipedia:PCX|PCX]] image format became popular on PC.


=== Simon's Quest (NES) ===
{| class="wikitable"
! Value || Meaning
|-
| 00-BF || Write this byte to the output.
|-
| C0-FF || Read another byte, and write it to the output ''n'' - 192 times.
|}


In ''Simon's Quest'' (NES) (all versions), the font is 256 characters long, although only a small part of that is actual text symbols used in dialog text. The following bytes have a special meaning:
=== PackBits ===
* FF = End of text rendering.
The [[wikipedia:PackBits|PackBits]] format was invented by Apple for MacPaint.
* FE = Newline.
It is also used in TIFF files and a few homebrew releases by [[User:Tepples|Damian Yerrick]].
* FD = Save current string pointer. The next byte determines the string number; all consecutive characters will be read from that string rather than the current one.
* FC = Denotes the end of a substring. Restores the string pointer saved by opcode FD.
* 00..FB = Print this character.
Dictionary strings are arbitrary length. There is room for only one saved string pointer, so substrings can not refer to other substrings, unless it is to terminate the entire string. The substring mechanism is used in the Japanese diskette version of the game. The cartridge versions of the game also support this mechanism, even though the actual text data does not utilize it.


=== Chrono Trigger (SNES) ===
{| class="wikitable"
! Value || Meaning
|-
| 00-7F || Copy ''n'' + 1 bytes from input to output.
|-
| 80 || No operation
|-
| 81-FF || Read another byte, and write it to the output 257 - ''n'' times.
|}


In ''Chrono Trigger'' (SNES) (all versions), the font is 768 characters long, but the strings are 8-bit. The following special bytes are defined:
=== Konami RLE ===
* 00 = End of string.
This format is used in the U.S. version of ''Contra'', and the Japanese version of ''Simon's Quest''.
* 01 = Read next byte; print character (byte+0x100).
It can be decoded and encoded with the Python program [https://github.com/sobodash/graveyardduck/blob/master/graveduck.py GraveyardDuck].
* 02 = Read next byte; print character (byte+0x200).
Compression ratio is more or less identical to PackBits.
* 03..20 = Various text effects, references to item tables, and references to party member names.
* 21..xx = Reference to a dictionary string. xx is a compile-time constant that determines the length of the dictionary. This number is 0x9F in the USA version and 0x3F in the Japanese version.
* xx+1..FF = Print this character.
Dictionary strings have a length limit of 255 bytes. They are not applied recursively. The dictionary strings are stored in length-data format without an end delimiter.


=== Bisqwit's ppu_read_buffer test (NES) ===
{| class="wikitable"
! Value || Meaning
|-
| 00-80 || Read another byte, and write it to the output ''n'' times.
|-
| 81-FE || Copy ''n'' - 128 bytes from input to output.
|-
| FF || End of compressed data
|}


Bisqwit's emulator test ROM ppu_read_buffer ([http://forums.nesdev.org/viewtopic.php?t=8847])
''Blades of Steel'' uses a subset of this format reserving a special value for jumping to a new PPU address. See: [https://datacrystal.romhacking.net/wiki/Blades_of_Steel:ROM_map Blades of Steel DataCrystal reference]
uses a combination of DTE and dictionary for text compression.
For DTE, the compression is applied recursively, in both DTE bytes.
Recursion for the first DTE byte is implemented using a software stack, and for the second DTE byte it is implemented as tail recursion.
Additionally the string may contain a jump label to another string,
permitting the reuse of the same string in different test descriptions.


== Bitrate reduction methods ==
=== GBA RLUnComp ===
The BIOS of the Game Boy Advance and Nintendo DS contains a decompressor for an RLE format very similar to PackBits and Konami.
As described in [http://problemkaputt.de/gbatek.htm#biosdecompressionfunctions GBATEK], it has a 4-byte size header followed by this:


=== Fixed-bit encoding ===
{| class="wikitable"
! Value || Meaning
|-
| 00-7F || Copy ''n'' + 1 bytes from input to output.
|-
| 80-FF || Read one byte from input and write it to output ''n'' - 125 times.
|}


When the character set is small, such as 64 characters at most,
=== PB53 ===
strings could be encoded in a bitstream that packs 6 bits per character
This codec was conceived by Damian Yerrick as an alternative to PackBits for the [[Action 53]] multicart, and a Python packer and 6502 unpacker are included in the Action 53 menu distribution.
rather than 8 bits per character. This results in 20 % reduction of data size.
Unlike freeform RLE formats such as Konami and PackBits, PB53 operates on 16-byte units, making it easy to divide the decompressed data into fixed-size packets to be sent to the PPU during vblank while rendering is turned on.
Like [[wikipedia:LZSS|LZSS]], PB53 uses [[wikipedia:unary coding|unary coding]] on the lengths of literal runs to save on overhead from switching between literal and run modes.
This means that like LZSS, it has a worst case expansion of 12.5%, but it works fairly well on real tile data and OK on nametable data, which have shorter runs than the high-resolution files for which PackBits was designed.
It also has a special mode to accommodate the layout of Shiru's NROM games ''LAN Master'', ''Lawn Mower'', and ''Chase'', which have many identical tiles between the two pattern tables to allow tile animation.


Main article: [[Fixed Bit Length Encoding]]
Each tile consists of several 8-byte planes, two planes in the NES implementation.
For the first plane in a tile:
{| class="wikitable"
! Value || Meaning
|-
| 00-7F || PB8: After this control byte, copy the following byte from input to output. Then, for each bit in the control byte from 6 to 0, if the bit is 1, repeat the previous byte; otherwise, copy another byte from the input. This is somewhat similar to how control bytes are formatted in LZSS.
|-
| 80 || Write eight $00 bytes.
|-
| 81 || Write eight $FF bytes.
|-
| 82 || Copy one tile (16 bytes) starting one tile back. (Used for a repeated tile, such as the unused tiles in many games.)
|-
| 83 || Copy one tile starting one segment back, usually 4096 bytes. (Used for pattern tables that share tiles, as seen in several Shiru games. The decoder switches between two instances one segment apart, each with its own input stream and output buffer.)
|-
| 84 || Write sixteen $00 bytes. (Solid color 0)
|-
| 85 || Write eight $FF bytes then eight $00 bytes. (Solid color 1)
|-
| 86 || Write eight $00 bytes then eight $FF bytes. (Solid color 2)
|-
| 87 || Write sixteen $FF bytes. (Solid color 3)
|}
For other planes:
{| class="wikitable"
! Value || Meaning
|-
| 00-81 || Same as first plane
|-
| 82 || Copy previous 8 bytes. (Used for 1-bit tiles with colors 0 and 3.)
|-
| 83 || Copy previous 8 bytes, bit-inverted. (Used for 1-bit tiles with colors 1 and 2.)
|}


=== Variable-bit encodings ===
PB8 is the same as PB53 except that bit 7 of the control byte is not special: it still means repeat the previous byte.
It is used in ''Haunted: Halloween '85'' and ''Haunted: Halloween '86''.
From [https://github.com/LIJI32/SameBoy/commit/4504de828a188b520a324687a1181af6c45a7e3a July 2019] to [https://github.com/LIJI32/SameBoy/commit/cb738190be6abca03d1cd3265440908107168a47 May 2020], the workalike boot ROM included with the Game Boy Color emulator SameBoy used PB8 for the emulator's logo.


In variable-bit encodings different symbols are stored in different number of bits.
PB16 is similar to PB8 with one change: each 1 bit means a repeat of the value two bytes back. (Bit 7 is not special, unlike in PB53.)
This distance of two bytes is better for Game Boy and Super NES tile data and Super NES tile maps.
It is used in the Game Boy ports of 240p Test Suite and ''Magic Floor''.


Example code in C for storing and retrieving variable-bit length integers:
PB8 and PB16 inspired the creation of PB12 for the SameBoy emulator's boot ROM.
So-called "PB12" by NieDzejkob (Jakub Kądziołka) is tuned to the statistics of the 3-color antialiased version of the SameBoy logo.
Control bytes are interleaved with literal bytes.
Control byte 00000001 is a terminator and thus must not occur in the byte stream.
Otherwise, each 2 or 4 bits of the control correspond to a 4-bit word.
{| class="wikitable"
! Value || Meaning
|-
| 00 || copy next byte from input
|-
| 0100 || Copy 1 byte back, ORed with the same byte shifted left 1
|-
| 0101 || Copy 1 byte back, ANDed with the same byte shifted left 1
|-
| 0110 || Copy 1 byte back, ORed with the same byte shifted right 1
|-
| 0111 || Copy 1 byte back, ANDed with the same byte shifted left 1
|-
| 10 || Copy 2 bytes back
|-
| 11 || Copy 1 byte back
|}


<pre><nowiki>#include <limits.h> // CHAR_BIT
=== RLEINC ===
This RLE variant was used by Joel Yliluoma in the Simon's Quest retranslation project.
It is very efficient when compressing nametables, which often contain redundancy in more complex forms than simple runs of repeating bytes.
Examples include brick walls, which repeat two tiles, and complex graphics that is formed from an ascending series of successive tiles.
For bitmap compression, it is slightly inferior to simpler RLE methods.


void PutBits(void* memory, unsigned* bitpos, unsigned long V, unsigned nbits)
{| class="wikitable"
{
! Value || Meaning
    unsigned char* buffer = (unsigned char*)(memory);
|-
    while(nbits > 0)
| 00&ndash;3F || LIT: Copy (n+1) bytes from input to output ''backwards''
    {
|-
        unsigned bytepos = *bitpos/CHAR_BIT, bits_remain = CHAR_BIT-*bitpos%CHAR_BIT, bits_taken = CHAR_BIT-bits_remain;
| 40          || END: End of stream
        unsigned bits_to_write = std::min(nbits, bits_remain);
|-
        unsigned value_mask    = (1 << bits_to_write)-1;
| 41&ndash;7F || SEQ: Read next byte b. Put b, (n&minus;0x3F) times; add 1 to b after each iteration
        unsigned value_to_write = V & value_mask;
|-
        buffer[bytepos] = (buffer[bytepos] & ~(value_mask << bits_taken)) | (value_to_write << bits_taken);
| 80&ndash;9F || DBL: Read next byte b1, and next byte b2. Put b1, (n&minus;0x7D) times; swap b2 and b1 after each iteration
        V >>= bits_to_write;
|-
        nbits  -= bits_to_write;
| A0&ndash;FF || RUN: Read byte b. Put b, (0x101&minus;n) times.
        *bitpos += bits_to_write;
|}
    }
}


unsigned long GetBits(const void* memory, unsigned* bitpos, unsigned nbits)
A compressor for this scheme can be found at http://bisqwit.iki.fi/src/nes-ppu_rleinc_compress.php.txt (PHP),
{
and a decompressor at http://bisqwit.iki.fi/src/nes-ppu_rleinc_v2b.inc (CA65).
    const unsigned char* buffer = (const unsigned char*)(memory);
    unsigned long result = 0, shift=0;
    while(nbits > 0)
    {
        unsigned bytepos = *bitpos/CHAR_BIT, bits_remain = CHAR_BIT-*bitpos%CHAR_BIT, bits_taken = CHAR_BIT-bits_remain;
        unsigned bits_to_take = std::min(nbits, bits_remain);
        unsigned v = (buffer[bytepos] >> bits_taken) & ((1 << bits_to_take)-1);
        result |= v << shift;
        shift += bits_to_take;
        nbits -= bits_to_take;
        *bitpos += bits_to_take;
    }
    return result;
}</nowiki></pre>


Ideally, you would store common symbols using few bits and infrequent symbols using more bits.
JRoatch made [http://forums.nesdev.org/viewtopic.php?p=161617#p161617 PBJ], which adds the PB8 mode from PB53 and a common-byte mode to a modified RLEINC.
Which brings us to&hellip;


==== Huffman coding ====
=== Bit-based RLE ===
Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.


A special case of variable-bit encodings is [https://en.wikipedia.org/wiki/Huffman_coding Huffman coding].  
When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, a simple reference RLE algorithm (PackBits) gets a 12% reduction in data size.
However, if the algorithm is modified as indicated below, a 21% reduction is achieved. For comparison, the graphics specialized algorithm in PB53 achieves 25% for that data set. Tokumaru compression manages to reduce the data by 41%.


Huffman coding defines the optimal coding for a given character set based on a table of frequencies that each character is used.
{| class="wikitable"
! Bit sequence  || Meaning
|-
| 0000 || End of stream.
|-
| 0nnn || Copy the next n&times;8 bits, i.e. ''n'' bytes, to the output.
|-
| 1nnn || Read the next 8 bits, and output this byte n+2 times.
|}


==== Arithmetic coding ====
A possible reason why bit-based compression methods are not popular on the NES is that bit-shifts are cumbersome and slow with the 6502 CPU, as it can only shift one bit at a time.
The above algorithm is still relatively simple to implement, as it operates on units of 4 bits for the input and full bytes for the output.
Coincidentally, it also produced the best compression out of all bit-based RLE algorithms that were brute-force-tested for that dataset.


Huffman coding is also a special case of arithmetic coding. In arithmetic coding, each symbol
=== NES Stripe Image RLE ===
is represented by a range of binary fractions rather than a particular number of bits.
A RLE format mostly used by Nintendo for use in their Arcade ports as well as their Mario games, Also used in some homebrew games!
As arithmetic coding was covered by a number of patents up to 1993, and it is calculation intensive,
chances are no game uses it, unless part of e.g. LZMA compression, which was released in 1999.


== LZ based methods ==
Format: dest, len, data, [dest, len, data, ]* end
 
dest: PPU destination address (2 bytes, big endian), to be written to $2006
 
len: Length (Byte) of PPU Buffer Data:
 
{| class="wikitable"
! Length Value || Meaning
|-
| 00-3F || Literal to right: Copy ''n'' + 1 bytes to video memory addresses increasing by 1
|-
| 40-7F || Run to right: Copy one byte ''n'' - 63 times to video memory addresses increasing by 1
|-
| 80-BF || Literal down: Copy ''n'' - 127 bytes to video memory addresses increasing by 32
|-
| C0-FF || Run down: Copy one byte ''n'' - 191 times to video memory addresses increasing by 32
|}
 
data: PPU Data to display
 
end: End of Data marker. Early games use $00; later games (particularly those with CHR RAM) may use any value with bit 7 set ($80-$FF) to allow writing to the first 16 tiles of video memory.
 
See also [https://forums.nesdev.org/viewtopic.php?f=22&t=15440 Popslide], a video memory update buffer framework using this format
 
=== SNES Stripe Image RLE ===
Same RLE format used by Nintendo as above, but for SNES.
 
Format: dest, len, data, end
 
dest: PPU Destination: $2116 and $2117
 
len: Length (Word) of PPU Buffer Data:
 
{| class="wikitable"
! Length Value || Meaning
|-
| 0000-3FFF || Copy ''n'' + 1 bytes to PPU buffer
|-
| 4000-7FFF || Copy ''n'' + 1 bytes to PPU buffer, with RLE
|-
| 8000-BFFF || Copy ''n'' + 1 bytes to PPU buffer, increment 32 bytes
|-
| C000-FFFF || Copy ''n'' + 1 bytes to PPU buffer, with RLE, increment 32 bytes
|}
 
data: PPU Data to display in 2-byte increments (or word increments)
 
end: Unlike the NES version, the end byte is $FF or $FFFF.
 
== Pixel based ==
=== Codemasters ===
This is a Markov chain (predictive) algorithm that packs predictions in varying number of bits.
It works on packets measured in whole tiles, and compresses mostly pixel by pixel, making it slow.
[http://forums.nesdev.org/viewtopic.php?p=48658#p48658 Explained on forum].
 
==== <span id="Tokumaru">Tokumaru</span> ====
Tokumaru discovered an improvement to the way the frequency tables are changed in Codemasters algorithm,
and [http://forums.nesdev.org/viewtopic.php?p=54230#p54230 released] the compressor and decompressor as open source.
And open-source rewrite of the encoder and decoder with slightly better
performance can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html
 
The compressed data begins with a byte that tells how many tiles it encodes. 256 is maximum.
Technically this byte can be ignored, if you already know how many tiles you are going to decode.
 
After the byte, any number of blocks follows. Each block begins with a ''color description table''.
This table tells how to encode transitions between colors in tiles belonging to this block.
 
There are four elements in this table, from 3 to 0, for color ''n''.
Each element begins with a two-bit integer ''ncolors[n]'',
that tells how many different colors that may follow a pixel of this particular color.
After the number of colors, comes a pivot color ''a'' that is encoded as follows:
{| class="wikitable"
! Value      || Applicable when  || Meaning
|-
| nothing    || ''ncolors[n]=0'' || No pivot color
|-
| 1 bit: 0  || ''ncolors[n]>0'' || Pivot color ''a'' is 1 if ''n < 1'', 0 otherwise.
|-
| 2 bits: 10 || ''ncolors[n]>0'' || Pivot color ''a'' is 2 if ''n < 2'', 1 otherwise.
|-
| 2 bits: 11 || ''ncolors[n]>0'' || Pivot color ''a'' is 3 if ''n < 3'', 2 otherwise.
|}
The table of transition colors is then calculated using ''n'', ''a'', and the number of colors ''ncolors[n]''.
First, two other colors ''b'' and ''c'' are calculated. They are the first color indexes that are neither ''n'' nor ''a''.
E.g. if ''a=2'' and ''n=1'', ''b'' and ''c'' become 0 and 3 respectively.
{| class="wikitable"
! When ''ncolors[n]'' is || Table of possible transition colors ''next[n]'' is
|-
| 0 || []
|-
| 1 || [a]
|-
| 2 || [b,c]
|-
| 3 || [a,b,c]
|}
For compression purposes, ideally ''ncolors[n]'' should be chosen to be the numbers of distinct colors
that actually can follow the color ''n'', based on measuring the tile data, and, if ''ncolors[n]=3'',
the pivot color ''a'' should be the color that is transitioned into most often from color ''n''.
This transition in tile data will be encoded in two bits, while the other transitions
are encoded in three bits. For other values of ''ncolors[n]'', the choice of pivot color
is mandated by the actual possible colors.
 
After the ''color description table'', comes ''tile data'' encoding 16 bytes, or 8&times;8 pixels.
Each tile is comprised of eight ''rows'' of pixels.
Each pixel row begins with 1 bit, that tells whether the row is to be copied.
If the bit is set, the previously decoded row is duplicated,
and no other data is encoded for this pixel row.
At the start of the block, the "previously decoded row" is assumed to contain zero bytes.
The previous row is not reset between different tiles, unless a new block begins.
If the bit was clear, eight pixels follow for ''x'' coordinates 0 to 7.
Each pixel is encoded as follows, depending on the color ''c'' of the previous pixel:
 
{| class="wikitable"
! Value      || Applicable when  || Meaning
|-
| 2 bits      || ''x = 0''        || The color of the first pixel ''c'' is encoded as a 2-bit integer.
|-
| nothing    || ''ncolors[c]=0'' || ''c'' does not change, and nothing is encoded.
|-
| 1 bit: 1    || ''ncolors[c]>0'' || ''c'' does not change from previous pixel.
|-
| 1 bit: 0    || ''ncolors[c]=1'' || ''c'' becomes ''next[c][0]''.
|-
| 2 bits: 00  || ''ncolors[c]>1'' || ''c'' becomes ''next[c][0]''.
|-
| 2 bits: 01  || ''ncolors[c]=2'' || ''c'' becomes ''next[c][1]''.
|-
| 3 bits: 010 || ''ncolors[c]=3'' || ''c'' becomes ''next[c][1]''.
|-
| 3 bits: 011 || ''ncolors[c]=3'' || ''c'' becomes ''next[c][2]''.
|}
 
After each full tile, if there are still remaining tiles
to be decoded, comes one bit that tells what comes next.
Value 1 means a new block start, with a new ''color description table''. Value 0 means that more ''tile data'' will follow.
 
== Common byte ==
=== Oracle common byte ===
This codec, used in ''The Legend of Zelda: Oracle of Seasons'' and ''The Legend of Zelda: Oracle of Ages'' according to [http://forums.nesdev.org/viewtopic.php?p=130355#p130355 Dwedit], is roughly comparable to RLE in complexity.
For each 16-byte block, the compressor determines the most common byte in that block.
The compressed data for each block starts with a 2-byte mask, then the most common byte, then other bytes in order that aren't the most common.
 
To decode a block: First read the two-byte mask. If the entire mask is zero, set the bit corresponding to the first byte to true. Then read the most common byte.
For each bit in the mask, if the bit is 1, write the most common byte to output; otherwise, copy one byte.
 
Maximum expansion is 12.5% for any block that has 16 different bytes in it: two bytes of mask and 16 bytes of data.
 
== LZSS ==
A lot of games on platforms after the NES use [[wikipedia:LZ77|LZ77]] family compression methods such as [[wikipedia:LZSS|LZSS]], which generalizes run-length encoding to allow copying data from several bytes ago, not just the previous byte.
Few NES games use LZ77 because the NES's limited work RAM and limited access to video memory make it difficult to resolve back references.
(Fewer still use LZW or any other LZ78-based method because of patents that subsisted through the NES, Super NES, and Nintendo 64 eras.)
 
In LZSS, the mask contains 8 commands, each either to copy a literal byte or to copy a back reference. determines whether the next 8 things are literal bytes or back references.
Once all commands have been processed, read another mask.
 
:Read a mask byte from input.
:For each bit in the mask byte:
::If the bit is 0, this is a literal:
:::Copy one byte from input to output.
::Otherwise, this is a back-reference:
:::Read and decode a length and distance from input. These will be positive integers.
:::Copy ''length'' bytes from the previous output, ''distance'' bytes before now, to output.
 
LZSS flavors vary mainly in how they encode lengths and distances.
 
=== LZ77UnComp ===
The BIOS of the Game Boy Advance and Nintendo DS has a built-in decompressor for a straightforward LZSS flavor with 3- to 18-byte references into the previous 4096 bytes of output.
The data format is described in [http://problemkaputt.de/gbatek.htm#biosdecompressionfunctions Martin Korth's GBATEK].
 
Caution: In data intended for decompression directly to the GBA or DS video memory, the second byte of a 16-bit word cannot refer to the first byte of the same word.
So the encoder must not write a run with distance 1 that starts at an odd offset.
 
=== Oracle LZ ===
This flavor of LZSS is used in ''The Legend of Zelda: Oracle'' games for Game Boy Color according to [http://forums.nesdev.org/viewtopic.php?p=130355#p130355 Dwedit].
 
An entire compressed block can be compressed with one of two subtypes of Oracle LZ: short word mode and long word mode.
Short words use references of 2 to 8 bytes into the previous 32 bytes of output, and long words use references of 3 to 33 bytes into the previous 2048 bytes.
(A length value of 0 means read an additional byte and use that as the length.)
Only short words would work very well on NES.
 
=== Pokémon LZ ===
 
This compression scheme is used in the second-generation Pokémon games on the Game Boy. It is used for bitmap compression.
 
A byte n is read and split into two parts: code = n >> 5, and c = n &amp; 0x1F. Byte 0xFF marks the end of the stream.
Otherwise the ''code'' is interpreted as follows:
 
{| class="wikitable"
! code  || Meaning
|-
| 0 || Copy the next ''c'' + 1 bytes to the output.
|-
| 1 || Read another byte, and write it to the output ''c'' + 1 times.
|-
| 2 || Read another byte b1 and byte b2, and write byte b1 to the output ''c'' + 1 times, swapping b1 and b2 after each write.
|-
| 3 || Write a zero byte (0x00) to the output ''c'' + 1 times.
|-
| 4 || Read byte b. If b < 0x80, then read byte b2; offset is b&times;256+b2 bytes from the output stream beginning. Else offset = b bytes ''behind'' from the current output stream end. Copy ''c'' + 1 bytes from the output stream at offset to the output.
|-
| 5 || Read byte b. If b < 0x80, then read byte b2; offset is b&times;256+b2 bytes from the output stream beginning. Else offset = b bytes ''behind'' from the current output stream end. Copy ''c'' + 1 bytes from the output stream at offset to the output, ''reversing the bits in each byte''.
|-
| 6 || Read byte b. If b < 0x80, then read byte b2; offset is b&times;256+b2 bytes from the output stream beginning. Else offset = b bytes ''behind'' from the current output stream end. Copy ''c'' + 1 bytes from the output stream at offset to the output, ''decrementing the read position after each write'' (i.e. reverse the data).
|-
| 7 || Read another byte b. Change code = (c >> 2), and change c = (c &amp; 3) &times; 256 + b. Re-interpret code according to this table.
|}
 
=== Chrono Trigger LZ ===
 
This compression scheme is used in Square&lsquo;s Chrono Trigger for the SNES for compression of graphics and various data.
 
The data consists of packets.
Each packet begins with a 16-bit offset, which gives the packet ending offset relative to the beginning of compressed data. At that offset, there is always a control byte.
The first control byte sets the size of offsets: If the byte is &lt; 0xC0, then ''offsetsbits''=12. Else ''offsetbits''=11. Interpreting the offsetbits is done only once.
The higher-order two bits in the control bytes of all other packets are ignored.
The ''counter'' is assigned a default value of 8.
 
The decompression loop goes as follows:
# If the packet end has been reached, a control byte is read, and ''counter'' is assigned the low 6 bits of that byte (i.e. ''counter'' = byte &amp; 0x3F). If the ''counter'' is zero, the decompression is complete and ends there. If the counter was nonzero, a new 16-bit packet end offset is read.
# If the end of the packet has not yet been reached, a ''mask'' byte is read.
#* If the mask byte is zero, then next eight bytes are copied verbatim to the output. The counter is not affected.
#* If the mask byte was nonzero, its each ''bit'' is read, beginning from the lowest-order bit. The number of bits to be read is determined by ''counter'' (which is in range 1&mdash;63, i.e. it can be both smaller and greater than eight).
#** If the ''bit'' is zero, a single byte is copied verbatim to the output.
#** If the bit is nonzero, a 16-bit number is read from the input. ''offset'' becomes the lowest-order ''offsetbits'' from that number, and ''length'' is the rest of the bits plus 3. The decompressor copies ''length'' bytes from ''offset'' bytes behind to the output.
#** After all bits have been read, the ''counter'' is reset to the default value of 8, and the decompression loop continues.
 
== External links ==
* [https://hbfs.wordpress.com/2009/04/14/ad-hoc-compression-methods-rle/ Ad Hoc Compression Methods: RLE] describes various pixel-level RLE methods applied to a drawing of a Pokémon
* [https://git.jroatch.xyz/donut-nes/ donut-nes] a fast and efficient CHR compression method by JRoatch

Revision as of 02:28, 2 September 2021

Tile compression refers to techniques that allow fitting more graphics data into a smaller space. Programs using CHR ROM cannot use compressed tiles, as their tile data must be stored in the PPU's native format. But programs using CHR RAM can process tile data while copying it from PRG ROM to CHR RAM, and this processing allows storing more tiles in the same space.

Run-length encoding

Run-length encoding transforms runs of identical bytes into a shorter sequence of bytes that specifies the length of the run.

In NES tile data, byte-level run-length encoding works well when a row of 8 pixels in a tile is identical to the row above it. It also works well for nametable data because a horizontal run of blank tiles becomes a single tile.

Pixel-level run-length encoding is much slower but can achieve impressive results within a tile.

There are several different RLE data formats.

PCX

The PCX image format became popular on PC.

Value Meaning
00-BF Write this byte to the output.
C0-FF Read another byte, and write it to the output n - 192 times.

PackBits

The PackBits format was invented by Apple for MacPaint. It is also used in TIFF files and a few homebrew releases by Damian Yerrick.

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80 No operation
81-FF Read another byte, and write it to the output 257 - n times.

Konami RLE

This format is used in the U.S. version of Contra, and the Japanese version of Simon's Quest. It can be decoded and encoded with the Python program GraveyardDuck. Compression ratio is more or less identical to PackBits.

Value Meaning
00-80 Read another byte, and write it to the output n times.
81-FE Copy n - 128 bytes from input to output.
FF End of compressed data

Blades of Steel uses a subset of this format reserving a special value for jumping to a new PPU address. See: Blades of Steel DataCrystal reference

GBA RLUnComp

The BIOS of the Game Boy Advance and Nintendo DS contains a decompressor for an RLE format very similar to PackBits and Konami. As described in GBATEK, it has a 4-byte size header followed by this:

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80-FF Read one byte from input and write it to output n - 125 times.

PB53

This codec was conceived by Damian Yerrick as an alternative to PackBits for the Action 53 multicart, and a Python packer and 6502 unpacker are included in the Action 53 menu distribution. Unlike freeform RLE formats such as Konami and PackBits, PB53 operates on 16-byte units, making it easy to divide the decompressed data into fixed-size packets to be sent to the PPU during vblank while rendering is turned on. Like LZSS, PB53 uses unary coding on the lengths of literal runs to save on overhead from switching between literal and run modes. This means that like LZSS, it has a worst case expansion of 12.5%, but it works fairly well on real tile data and OK on nametable data, which have shorter runs than the high-resolution files for which PackBits was designed. It also has a special mode to accommodate the layout of Shiru's NROM games LAN Master, Lawn Mower, and Chase, which have many identical tiles between the two pattern tables to allow tile animation.

Each tile consists of several 8-byte planes, two planes in the NES implementation. For the first plane in a tile:

Value Meaning
00-7F PB8: After this control byte, copy the following byte from input to output. Then, for each bit in the control byte from 6 to 0, if the bit is 1, repeat the previous byte; otherwise, copy another byte from the input. This is somewhat similar to how control bytes are formatted in LZSS.
80 Write eight $00 bytes.
81 Write eight $FF bytes.
82 Copy one tile (16 bytes) starting one tile back. (Used for a repeated tile, such as the unused tiles in many games.)
83 Copy one tile starting one segment back, usually 4096 bytes. (Used for pattern tables that share tiles, as seen in several Shiru games. The decoder switches between two instances one segment apart, each with its own input stream and output buffer.)
84 Write sixteen $00 bytes. (Solid color 0)
85 Write eight $FF bytes then eight $00 bytes. (Solid color 1)
86 Write eight $00 bytes then eight $FF bytes. (Solid color 2)
87 Write sixteen $FF bytes. (Solid color 3)

For other planes:

Value Meaning
00-81 Same as first plane
82 Copy previous 8 bytes. (Used for 1-bit tiles with colors 0 and 3.)
83 Copy previous 8 bytes, bit-inverted. (Used for 1-bit tiles with colors 1 and 2.)

PB8 is the same as PB53 except that bit 7 of the control byte is not special: it still means repeat the previous byte. It is used in Haunted: Halloween '85 and Haunted: Halloween '86. From July 2019 to May 2020, the workalike boot ROM included with the Game Boy Color emulator SameBoy used PB8 for the emulator's logo.

PB16 is similar to PB8 with one change: each 1 bit means a repeat of the value two bytes back. (Bit 7 is not special, unlike in PB53.) This distance of two bytes is better for Game Boy and Super NES tile data and Super NES tile maps. It is used in the Game Boy ports of 240p Test Suite and Magic Floor.

PB8 and PB16 inspired the creation of PB12 for the SameBoy emulator's boot ROM. So-called "PB12" by NieDzejkob (Jakub Kądziołka) is tuned to the statistics of the 3-color antialiased version of the SameBoy logo. Control bytes are interleaved with literal bytes. Control byte 00000001 is a terminator and thus must not occur in the byte stream. Otherwise, each 2 or 4 bits of the control correspond to a 4-bit word.

Value Meaning
00 copy next byte from input
0100 Copy 1 byte back, ORed with the same byte shifted left 1
0101 Copy 1 byte back, ANDed with the same byte shifted left 1
0110 Copy 1 byte back, ORed with the same byte shifted right 1
0111 Copy 1 byte back, ANDed with the same byte shifted left 1
10 Copy 2 bytes back
11 Copy 1 byte back

RLEINC

This RLE variant was used by Joel Yliluoma in the Simon's Quest retranslation project. It is very efficient when compressing nametables, which often contain redundancy in more complex forms than simple runs of repeating bytes. Examples include brick walls, which repeat two tiles, and complex graphics that is formed from an ascending series of successive tiles. For bitmap compression, it is slightly inferior to simpler RLE methods.

Value Meaning
00–3F LIT: Copy (n+1) bytes from input to output backwards
40 END: End of stream
41–7F SEQ: Read next byte b. Put b, (n−0x3F) times; add 1 to b after each iteration
80–9F DBL: Read next byte b1, and next byte b2. Put b1, (n−0x7D) times; swap b2 and b1 after each iteration
A0–FF RUN: Read byte b. Put b, (0x101−n) times.

A compressor for this scheme can be found at http://bisqwit.iki.fi/src/nes-ppu_rleinc_compress.php.txt (PHP), and a decompressor at http://bisqwit.iki.fi/src/nes-ppu_rleinc_v2b.inc (CA65).

JRoatch made PBJ, which adds the PB8 mode from PB53 and a common-byte mode to a modified RLEINC.

Bit-based RLE

Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.

When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, a simple reference RLE algorithm (PackBits) gets a 12% reduction in data size. However, if the algorithm is modified as indicated below, a 21% reduction is achieved. For comparison, the graphics specialized algorithm in PB53 achieves 25% for that data set. Tokumaru compression manages to reduce the data by 41%.

Bit sequence Meaning
0000 End of stream.
0nnn Copy the next n×8 bits, i.e. n bytes, to the output.
1nnn Read the next 8 bits, and output this byte n+2 times.

A possible reason why bit-based compression methods are not popular on the NES is that bit-shifts are cumbersome and slow with the 6502 CPU, as it can only shift one bit at a time. The above algorithm is still relatively simple to implement, as it operates on units of 4 bits for the input and full bytes for the output. Coincidentally, it also produced the best compression out of all bit-based RLE algorithms that were brute-force-tested for that dataset.

NES Stripe Image RLE

A RLE format mostly used by Nintendo for use in their Arcade ports as well as their Mario games, Also used in some homebrew games!

Format: dest, len, data, [dest, len, data, ]* end

dest: PPU destination address (2 bytes, big endian), to be written to $2006

len: Length (Byte) of PPU Buffer Data:

Length Value Meaning
00-3F Literal to right: Copy n + 1 bytes to video memory addresses increasing by 1
40-7F Run to right: Copy one byte n - 63 times to video memory addresses increasing by 1
80-BF Literal down: Copy n - 127 bytes to video memory addresses increasing by 32
C0-FF Run down: Copy one byte n - 191 times to video memory addresses increasing by 32

data: PPU Data to display

end: End of Data marker. Early games use $00; later games (particularly those with CHR RAM) may use any value with bit 7 set ($80-$FF) to allow writing to the first 16 tiles of video memory.

See also Popslide, a video memory update buffer framework using this format

SNES Stripe Image RLE

Same RLE format used by Nintendo as above, but for SNES.

Format: dest, len, data, end

dest: PPU Destination: $2116 and $2117

len: Length (Word) of PPU Buffer Data:

Length Value Meaning
0000-3FFF Copy n + 1 bytes to PPU buffer
4000-7FFF Copy n + 1 bytes to PPU buffer, with RLE
8000-BFFF Copy n + 1 bytes to PPU buffer, increment 32 bytes
C000-FFFF Copy n + 1 bytes to PPU buffer, with RLE, increment 32 bytes

data: PPU Data to display in 2-byte increments (or word increments)

end: Unlike the NES version, the end byte is $FF or $FFFF.

Pixel based

Codemasters

This is a Markov chain (predictive) algorithm that packs predictions in varying number of bits. It works on packets measured in whole tiles, and compresses mostly pixel by pixel, making it slow. Explained on forum.

Tokumaru

Tokumaru discovered an improvement to the way the frequency tables are changed in Codemasters algorithm, and released the compressor and decompressor as open source. And open-source rewrite of the encoder and decoder with slightly better performance can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html

The compressed data begins with a byte that tells how many tiles it encodes. 256 is maximum. Technically this byte can be ignored, if you already know how many tiles you are going to decode.

After the byte, any number of blocks follows. Each block begins with a color description table. This table tells how to encode transitions between colors in tiles belonging to this block.

There are four elements in this table, from 3 to 0, for color n. Each element begins with a two-bit integer ncolors[n], that tells how many different colors that may follow a pixel of this particular color. After the number of colors, comes a pivot color a that is encoded as follows:

Value Applicable when Meaning
nothing ncolors[n]=0 No pivot color
1 bit: 0 ncolors[n]>0 Pivot color a is 1 if n < 1, 0 otherwise.
2 bits: 10 ncolors[n]>0 Pivot color a is 2 if n < 2, 1 otherwise.
2 bits: 11 ncolors[n]>0 Pivot color a is 3 if n < 3, 2 otherwise.

The table of transition colors is then calculated using n, a, and the number of colors ncolors[n]. First, two other colors b and c are calculated. They are the first color indexes that are neither n nor a. E.g. if a=2 and n=1, b and c become 0 and 3 respectively.

When ncolors[n] is Table of possible transition colors next[n] is
0 []
1 [a]
2 [b,c]
3 [a,b,c]

For compression purposes, ideally ncolors[n] should be chosen to be the numbers of distinct colors that actually can follow the color n, based on measuring the tile data, and, if ncolors[n]=3, the pivot color a should be the color that is transitioned into most often from color n. This transition in tile data will be encoded in two bits, while the other transitions are encoded in three bits. For other values of ncolors[n], the choice of pivot color is mandated by the actual possible colors.

After the color description table, comes tile data encoding 16 bytes, or 8×8 pixels. Each tile is comprised of eight rows of pixels. Each pixel row begins with 1 bit, that tells whether the row is to be copied. If the bit is set, the previously decoded row is duplicated, and no other data is encoded for this pixel row. At the start of the block, the "previously decoded row" is assumed to contain zero bytes. The previous row is not reset between different tiles, unless a new block begins. If the bit was clear, eight pixels follow for x coordinates 0 to 7. Each pixel is encoded as follows, depending on the color c of the previous pixel:

Value Applicable when Meaning
2 bits x = 0 The color of the first pixel c is encoded as a 2-bit integer.
nothing ncolors[c]=0 c does not change, and nothing is encoded.
1 bit: 1 ncolors[c]>0 c does not change from previous pixel.
1 bit: 0 ncolors[c]=1 c becomes next[c][0].
2 bits: 00 ncolors[c]>1 c becomes next[c][0].
2 bits: 01 ncolors[c]=2 c becomes next[c][1].
3 bits: 010 ncolors[c]=3 c becomes next[c][1].
3 bits: 011 ncolors[c]=3 c becomes next[c][2].

After each full tile, if there are still remaining tiles to be decoded, comes one bit that tells what comes next. Value 1 means a new block start, with a new color description table. Value 0 means that more tile data will follow.

Common byte

Oracle common byte

This codec, used in The Legend of Zelda: Oracle of Seasons and The Legend of Zelda: Oracle of Ages according to Dwedit, is roughly comparable to RLE in complexity. For each 16-byte block, the compressor determines the most common byte in that block. The compressed data for each block starts with a 2-byte mask, then the most common byte, then other bytes in order that aren't the most common.

To decode a block: First read the two-byte mask. If the entire mask is zero, set the bit corresponding to the first byte to true. Then read the most common byte. For each bit in the mask, if the bit is 1, write the most common byte to output; otherwise, copy one byte.

Maximum expansion is 12.5% for any block that has 16 different bytes in it: two bytes of mask and 16 bytes of data.

LZSS

A lot of games on platforms after the NES use LZ77 family compression methods such as LZSS, which generalizes run-length encoding to allow copying data from several bytes ago, not just the previous byte. Few NES games use LZ77 because the NES's limited work RAM and limited access to video memory make it difficult to resolve back references. (Fewer still use LZW or any other LZ78-based method because of patents that subsisted through the NES, Super NES, and Nintendo 64 eras.)

In LZSS, the mask contains 8 commands, each either to copy a literal byte or to copy a back reference. determines whether the next 8 things are literal bytes or back references. Once all commands have been processed, read another mask.

Read a mask byte from input.
For each bit in the mask byte:
If the bit is 0, this is a literal:
Copy one byte from input to output.
Otherwise, this is a back-reference:
Read and decode a length and distance from input. These will be positive integers.
Copy length bytes from the previous output, distance bytes before now, to output.

LZSS flavors vary mainly in how they encode lengths and distances.

LZ77UnComp

The BIOS of the Game Boy Advance and Nintendo DS has a built-in decompressor for a straightforward LZSS flavor with 3- to 18-byte references into the previous 4096 bytes of output. The data format is described in Martin Korth's GBATEK.

Caution: In data intended for decompression directly to the GBA or DS video memory, the second byte of a 16-bit word cannot refer to the first byte of the same word. So the encoder must not write a run with distance 1 that starts at an odd offset.

Oracle LZ

This flavor of LZSS is used in The Legend of Zelda: Oracle games for Game Boy Color according to Dwedit.

An entire compressed block can be compressed with one of two subtypes of Oracle LZ: short word mode and long word mode. Short words use references of 2 to 8 bytes into the previous 32 bytes of output, and long words use references of 3 to 33 bytes into the previous 2048 bytes. (A length value of 0 means read an additional byte and use that as the length.) Only short words would work very well on NES.

Pokémon LZ

This compression scheme is used in the second-generation Pokémon games on the Game Boy. It is used for bitmap compression.

A byte n is read and split into two parts: code = n >> 5, and c = n & 0x1F. Byte 0xFF marks the end of the stream. Otherwise the code is interpreted as follows:

code Meaning
0 Copy the next c + 1 bytes to the output.
1 Read another byte, and write it to the output c + 1 times.
2 Read another byte b1 and byte b2, and write byte b1 to the output c + 1 times, swapping b1 and b2 after each write.
3 Write a zero byte (0x00) to the output c + 1 times.
4 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output.
5 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, reversing the bits in each byte.
6 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, decrementing the read position after each write (i.e. reverse the data).
7 Read another byte b. Change code = (c >> 2), and change c = (c & 3) × 256 + b. Re-interpret code according to this table.

Chrono Trigger LZ

This compression scheme is used in Square‘s Chrono Trigger for the SNES for compression of graphics and various data.

The data consists of packets. Each packet begins with a 16-bit offset, which gives the packet ending offset relative to the beginning of compressed data. At that offset, there is always a control byte. The first control byte sets the size of offsets: If the byte is < 0xC0, then offsetsbits=12. Else offsetbits=11. Interpreting the offsetbits is done only once. The higher-order two bits in the control bytes of all other packets are ignored. The counter is assigned a default value of 8.

The decompression loop goes as follows:

  1. If the packet end has been reached, a control byte is read, and counter is assigned the low 6 bits of that byte (i.e. counter = byte & 0x3F). If the counter is zero, the decompression is complete and ends there. If the counter was nonzero, a new 16-bit packet end offset is read.
  2. If the end of the packet has not yet been reached, a mask byte is read.
    • If the mask byte is zero, then next eight bytes are copied verbatim to the output. The counter is not affected.
    • If the mask byte was nonzero, its each bit is read, beginning from the lowest-order bit. The number of bits to be read is determined by counter (which is in range 1—63, i.e. it can be both smaller and greater than eight).
      • If the bit is zero, a single byte is copied verbatim to the output.
      • If the bit is nonzero, a 16-bit number is read from the input. offset becomes the lowest-order offsetbits from that number, and length is the rest of the bits plus 3. The decompressor copies length bytes from offset bytes behind to the output.
      • After all bits have been read, the counter is reset to the default value of 8, and the decompression loop continues.

External links