andrewherd.me

Technical Artist

Python RLE Encoding

While working through one of the various planning stages of a new section of one of my projects, I realized that I was going to have a lot of integers on hand that repeated frequently. I know I’m going to need these lists at a later time, so my two options is to save it somewhere or generate the list every time I need it. Knowing how to generate it was easy, but storing that information as compactly as possible was not.

Doing a little research I decided to look into run-length encoding. I just looked up encoding and there are many examples on the net. It turns out the example I picked was broken, so it required a little bit of fixing to get it working.

In my case, I know that I will have a list and it will always be an integer that I can convert to a chr.
This still probably looks like a lot of examples out there:

def encode(data):
    count = 1
    rle = []
    prev = ''
    for d in data:
        d = chr(d)
        if prev:
            if d != prev:
                rle.append([count,prev])
                count = 1
                prev = d
            else:
                count += 1
        else:
            prev = d

    #catch the last item in the list
    rle.append((count,prev))

Now this is great, I get my list [[1,b],[4,c], etc] and it’s roughly half the size I started with. But there are still a lot of areas that have single items ([1,b]). Those can be compacted together in a single tuple.

For ease of the experiment I put it in it’s own for-loop:

    # now pick up all the one item objects and compress those if
    # possible
    result = []
    queue = []
    for r in rle:
        if r[0] == 1:
            queue.append(r)
        else:
            if queue:   #check if anything has been added to the queue
                        #to process
                tmp = ''
                for o in queue:
                    tmp += o[1]
                result.append([-1,tmp]) #add in the queue
            result.append(r)
            queue = []
    
    return result

Now, all the single item objects would be gathered together and the list recreated. It’s not perfect but it got the job done for me for the time I wanted to spend on this.

From here, I could create a string out of the list and save it to file. For added fun, I then gzipped the file. My example test came out to be about 600 bytes. Huge difference from 8kb from saving the int list as a comma-delimited text file. (Though that wasn’t gzipped so it’s not the best comparison)

This is great, but I can’t really use this compressed list, so I need to unpack my list. A short time later, I have this semi-elegant decoder. And thus I have my int-list back and I can continue on my processing.

def decode(data):
    result = []
    for d in data:
        if d[0] == -1:
            for n in d[1]:
                result.append(ord(n))
        else:
            for i in range(0,d[0]):
                result.append(ord(d[1]))

    return result

I can see a few places to improve on it but since this was just an exercise for myself I’m going to leave it like this for now. Once I decide what I’m doing for my project I may come back to this to clean it up.

Leave a Reply