By Tim McNamara

Conventionally, run-length encoding is restricted to a fixed alphabet. Because the encoded form is often transmitted over human-readable media, this alphabet is usually restricted to printed characters. That makes run-length encoding unsuitable for transmitting arbitrary byte strings.

To address this, run-length encoding can be extended by converting the numerical value of each byte to its hexadecimal form. Hexadecimal numbers are often prefixed with 0x. We can replacing 0 with the actual count, in decimal, to allow for parsing to be fast and unambiguous.

Consider the input string Hello. The following two lines illustrate the different representations in conventional run-length encoding and the extended variant proposed here:

  • 1H 1e 2l 1o
  • 1x48 1x65 2x6c 1x6f

An example implementation in Python:

def run_length_encode(text, delimiter=''):
  count = 1
  hex_digits = [hex(ord(x)) for x in text]
  encoded_fragments = []
  
  for (i, current) in enumerate(hex_digits):
    try:
      next = hex_digits[i+1]
    except IndexError:
      next = None
    
    if current != next:
      fragment = str(count) + current[1:]
      encoded_fragments.append(fragment)
      count = 0
    
    count += 1
     
  return delimiter.join(encoded_fragments)

Comments

Please log in to add a comment.
Authors

Tim McNamara

Metadata

Zenodo.5606102

Published: 27 Oct, 2021

Cc by