Skip to content

Latest commit

 

History

History

xascii85

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

xascii85

Standard Encoding interface on top of "encoding/ascii85".

Copyright

Copyright (c) 2022 Teal.Finance contributors

This file is part of Teal.Finance/BaseXX licensed under the MIT License. See the LICENSE file or https://opensource.org/licenses/MIT. SPDX-License-Identifier: MIT

Ascii85 advantages

The main idea is to encode by chunk of 4 bytes, instead of 3 bytes for Base64.

There are 95 printable ASCII characters including the space. To represent 4 bytes, 5 printable ASCII characters are required:

 95⁵ = 7 737 809 375   <-- Minimum 5 printable ASCII characters
256⁴ = 4 294 967 296
 95⁴ =    81 450 625

The minimum set is 85 characters:

 86⁵ = 4 704 270 176
 85⁵ = 4 437 053 125   <-- Minimum 85 different ASCII characters
256⁴ = 4 294 967 296
 84⁵ = 4 182 119 424
 83⁵ = 3 939 040 643

Therefore, 85 is the minimal number of different characters, to encode any sequence of 4 bytes as 5 printable ASCII characters.

Interface

The idea is to provide the same interface as "encoding/base64". See https://pkg.go.dev/encoding/base64

func NewEncoding(encoder string) *Encoding

interface Encoding {
    Decode(dst, src []byte) (n int, err error)
    Encode(dst, src []byte) (n int)
    // Here Encode() returns the number of written bytes.
    // This is different with encoding/base64.
    // Ascii85 encoded length cannot be known from just
    // the number of bytes to encode, whereas it can with Base64.
    
    DecodeString(s string) ([]byte, error)
    EncodeToString(src []byte) string
    
    DecodedLen(n int) int // Returns the Max.
    EncodedLen(n int) int // Returns the Max.
    
    // Not implemented.
    // Strict() *Encoding
    // WithPadding(padding rune) *Encoding
}

Definition in PostScript documentation

Asci85 encodes binary data in an ASCII base-85 representation. This encoding uses nearly all of the printable ASCII character set. The resulting expansion factor is 4:5, making this encoding much more efficient than hexadecimal.

Encoding alphabet

ASCII characters from 0x21 ! through 0x75 u.

Comparison to other encodings

Base95  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ (and space)
Base94  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Base92  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%& ()*+,-./:;<=>?@[ ]^_`{|}~
Base91  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%& ()*+, ./:;<=>?@[ ]^_`{|}~
Ascii85 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu    z!"#$%&'()*+,-./:;<=>?@[\]^_`
Z85     0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz! #$%& ()*+ -./: <=>?@[ ]^  { }
Base70  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz          + -./           _    ~
Base64  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz          +   /
Base62  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Base58   123456789ABCDEFGH JKLMN PQRSTUVWXYZabcdefghijk mnopqrstuvwxyz
Hexa    0123456789ABCDEF

Specification in PostScript documentation

The ASCII85Encode filter encodes binary data in the ASCII base-85 encoding. Generally, for every 4 bytes of binary data, it produces 5 ASCII printing characters in the range ! through u. It inserts a newline in the encoded output at least once every 80 characters, thereby limiting the lengths of lines.

When the ASCII85Encode filter is closed, it writes the 2-character sequence ~> as an EOD marker.

Binary data bytes are encoded in 4-tuples (groups of 4). Each 4-tuple is used to produce a 5-tuple of ASCII characters. If the binary 4-tuple is (b1 b2 b3 b4) and the encoded 5-tuple is (c1 c2 c3 c4 c5), then the relation between them is

(b1 × 256³) + (b2 × 256²) + (b3 × 256¹) + b4 = (c1 × 85⁴) + (c2 × 85³) + (c3 × 85²) + (c4 × 85¹) + c5

In other words, 4 bytes of binary data are interpreted as a base-256 number and then converted into a base-85 number. The five “digits” of this number, (c1 c2 c3 c4 c5), are then converted into ASCII characters by adding 33, which is the ASCII code for !, to each. ASCII characters in the range ! to u are used, where ! represents the value 0 and u represents the value 84.

As a special case, if all five digits are 0, they are represented by a single character z instead of by !!!!!.