While working through one of the various planning stages of a new section of one of my projects, I realized that I was going to have a lot of integers on hand that repeated frequently. I know I’m going to need these lists at a later time, so my two options is to save it somewhere or generate the list every time I need it. Knowing how to generate it was easy, but storing that information as compactly as possible was not.
Doing a little research I decided to look into run-length encoding. I just looked up encoding and there are many examples on the net. It turns out the example I picked was broken, so it required a little bit of fixing to get it working.
In my case, I know that I will have a list and it will always be an integer that I can convert to a chr.
This still probably looks like a lot of examples out there:
def encode(data):
count = 1
rle = []
prev = ''
for d in data:
d = chr(d)
if prev:
if d != prev:
rle.append([count,prev])
count = 1
prev = d
else:
count += 1
else:
prev = d
#catch the last item in the list
rle.append((count,prev))
Now this is great, I get my list [[1,b],[4,c], etc] and it’s roughly half the size I started with. But there are still a lot of areas that have single items ([1,b]). Those can be compacted together in a single tuple.
For ease of the experiment I put it in it’s own for-loop:
# now pick up all the one item objects and compress those if
# possible
result = []
queue = []
for r in rle:
if r[0] == 1:
queue.append(r)
else:
if queue: #check if anything has been added to the queue
#to process
tmp = ''
for o in queue:
tmp += o[1]
result.append([-1,tmp]) #add in the queue
result.append(r)
queue = []
return result
Now, all the single item objects would be gathered together and the list recreated. It’s not perfect but it got the job done for me for the time I wanted to spend on this.
From here, I could create a string out of the list and save it to file. For added fun, I then gzipped the file. My example test came out to be about 600 bytes. Huge difference from 8kb from saving the int list as a comma-delimited text file. (Though that wasn’t gzipped so it’s not the best comparison)
This is great, but I can’t really use this compressed list, so I need to unpack my list. A short time later, I have this semi-elegant decoder. And thus I have my int-list back and I can continue on my processing.
def decode(data):
result = []
for d in data:
if d[0] == -1:
for n in d[1]:
result.append(ord(n))
else:
for i in range(0,d[0]):
result.append(ord(d[1]))
return result
I can see a few places to improve on it but since this was just an exercise for myself I’m going to leave it like this for now. Once I decide what I’m doing for my project I may come back to this to clean it up.
Leave a Reply