By Tim McNamara

Calculating Shannon entropy on octet streams can be confusing, especially with those who lack a background in mathematics. It's also not necessary in many cases. A more intuitive calculation, which is also easier to implement, involves calculating the average difference.

An informal description of the algorithm:

  • Find the absolute difference of each octet, as a numeric value, against its successor. Pad with 0 values where the calculation is impossible.
  • Exclude the left-most and right-most values.
  • Take the average of those differences.

An example implementation in Python:

def entropy(data, encoding=None):
    if not isinstance(data, bytes):
        data = bytes(data, encoding=encoding)
    data = list(data)
    A = [0] + data
    B = data + [0]
    diff = [abs(B[i] - A[i]) for i in range(len(data))][1:-1]
    return sum(diff) / len(diff)

Comments

Please log in to add a comment.
Authors

Tim McNamara

Metadata

Zenodo.6644371

Published: 13 Jun, 2022

Cc by