Extending run-length encoding to support arbitrary byte streams with hexadecimal notation
By Tim McNamara
Conventionally, run-length encoding is restricted to a fixed alphabet. Because the encoded form is often transmitted over human-readable media, this alphabet is usually restricted to printed characters. That makes run-length encoding unsuitable for transmitting arbitrary byte strings.
To address this, run-length encoding can be extended by converting the numerical value of each byte to its hexadecimal form. Hexadecimal numbers are often prefixed with
0x. We can replacing
0 with the actual count, in decimal, to allow for parsing to be fast and unambiguous.
Consider the input string
Hello. The following two lines illustrate the different representations in conventional run-length encoding and the extended variant proposed here:
1H 1e 2l 1o
1x48 1x65 2x6c 1x6f
An example implementation in Python:
def run_length_encode(text, delimiter=''): count = 1 hex_digits = [hex(ord(x)) for x in text] encoded_fragments =  for (i, current) in enumerate(hex_digits): try: next = hex_digits[i+1] except IndexError: next = None if current != next: fragment = str(count) + current[1:] encoded_fragments.append(fragment) count = 0 count += 1 return delimiter.join(encoded_fragments)