Working With Binary Data in Python

Working with binary data puzzled me for quite a time, so here is a post explaining some of my findings. First we define a bytes array:

s = bytearray(b"Hello World") 
for i in s:
      print i 
... 72 101 108 108 111 32 87 111 114 108 100

now, let's write this data to a file:

f = open('helloword.bin','wb')
for i in s:
   f.write(struct.pack("I",i)) f.close()

Let's inspect the file created:

$ du -h points.bin 
4.0K    points.bin 
$file points.bin 
points.bin: data 
$ less points.bin 
"points.bin" may be a binary file.  See it anyway?

Binary file sizes

let's us write "hello world" into a text file in a text form:

f = open('helloword.txt") 
f.write("hello world") 

Once again we can inspect the file:

$ du -h   helloworld.txt 
4.0K    helloworld.txt

Now, what happens if we make a longer binary array?\

bytearray(b'Hello World!') 
for i in range(10000): 
f = open('longhelloword.bin', 'wb') 
import struct

for i in s: 
f = open('longhelloword.txt', 'w') 
hello = "Hello World!" 
for i in range(10000): 
    hello = hello+"!"

In a shell, examaining the file sizes:

$ du -h longhelloword.bin 
40K     longhelloword.bin 
$ du -h longhelloword.txt 
12K     longhelloword.txt

Wait a minute ! Why is the binary file almost 4 times bigger?

The answer is: it depends on the format specifier in struct.pack. Namely, we used an unsigned int, for each character we then reserved 4 bytes!. When we saved the text, every character was assigned to the file exactly as a char which takes one byte only.

If we repeat the above with struct.pack("b",i) the sizes of the file won't differ:

f = open('longhellowordwithchar.bin', 'w') 
for i in s: 

and in the shell:

$ du  longhellowordwithchar.bin 
12      longhellowordwithchar.bin 
$ du  longhelloword.txt        
12      longhelloword.txt


This entry was tagged: programming, python

Share this post:


comments powered by Disqus