MicroPython 'ustruct' Module
Contents
Introduction
Traditionally the Python module struct is used to convert between Python data structures and C structs. The MicroPython equivalent is the module ustruct which implements nearly all of the methods and functions found in the Python module[1].
The ustruct module is simple and easy to use. It exposes five methods:
- ustruct.calcsize()
- ustruct.pack()
- ustruct.unpack()
- ustruct.pack_into()
- ustruct.unpack_from()
It is necessary to use an import statement, for example:
import ustruct
before MicroPython programs can access the module's methods.
In MicroPython, import ustruct and import struct are equivalent and can be used interchangeably. However, for consistency this tutorial will use the module name ustruct. The ‘u’ in ustruct symbolizes the ‘Micro’ in MicroPython.
While many microcontroller programmers may never have cause to use this module, the ustruct module does have its uses. Complex record structures can be defined and used to pack data into a very memory efficient binary format. Using ustruct formatted buffers (using bytearrays) are a very easy way to pack or unpack data records in a random-access manner.
Uses include (but not limited to!) reading and writing binary log files, reformatting binary data from a sensor and buffering data to go to an output device.
ustruct Format Codes
The ustruct module packs (and unpacks) numerical and string values into (and from) binary data. The structure of this binary data is defined by format strings composed of standardised codes.
Byte Order Codes
The first character of the format string determines the order in which the bytes are packed. There are two possibilities: little-endian and big-endian.
A big-endian system stores the most significant byte (MSB) of the data at the smallest memory address and the least significant byte (LSB) at the largest.
A small-endian system operates in reverse with the LSB stored at the smallest address and the MSB stored at the highest address. An example best explains this:
Consider the base10 (decimal) value: 123456789
It has the hexadecimal value: 0x075BCD15
This can be broken down into 4 x bytes: 07, 5B, CD, 15
MSB → → → → → LSB
The order that these four bytes are stored in memory
depends on whether the ordering is little-endian
or big-endian.
Memory Location: x01 x02 x03 x04
big-endian: 07 5B CD 15
little-endian: 15 CD 5B 07
The MicroPython usys module[2] provides the byteorder property that reports whether the host system is big-endian or little endian. This property returns a string 'big' or 'little' respectively.
Example 1
# Report on whether this system is
# big-endian or small-endian.
import usys
s = usys.byteorder + '-endian'
print('micro:bit byte order is:', s)
Output:
micro:bit byte order is: little-endian
Table 1 lists and describes the byte order codes.
Character | Byte Order |
---|---|
@ | native |
< | little-endian |
> | big-endian |
! | network (always big-endian) |
The byte order code may be omitted from the format string in which case it defaults to the @ (native) code. The byte order of the current system is then used.
Data Type Codes
Following the byte order code (if specified) are any number of data type formatting characters. These describe in detail the structure of the binary data to be packed or unpacked. Table 2 lists and describes every code available with the micro:bit's version of MicroPython.
The packed size is given in bytes.
Character | C Type | Python Type | Packed Size |
---|---|---|---|
b | signed char | integer | 1 |
B | unsigned char | integer | 1 |
h | short | integer | 2 |
H | unsigned short | integer | 2 |
i | int | integer | 4 |
I | unsigned int | integer | 4 |
l | long | integer | 4 |
L | unsigned long | integer | 4 |
q | long long | integer | 8 |
Q | unsigned long long | integer | 8 |
f | float | float | 4 |
d | double | float | 8 |
s | char[] | bytes[3] |
The 's' formatting string code provides a way to pack strings into a binary structure. The size to be allocated in the binary packed structure is specified with a prefixed integer. If allocated size is larger than the size of the string the binary data will be padded out with nulls i.e. \x00.
's' Example A Format string: '<5s' String to pack: 'Hello' Packed string: b'Hello' 's' Example B Format string: '<10s' String to pack: 'Hello' Packed string: b'Hello\x00\x00\x00\x00\x00' 's' Example C Format string: '<2s' String to pack: 'Hello' Packed string: b'He'
All three examples force the use of little-endian with the '<' byte order code.
Example A shows the string packed to its exact size i.e. 5 bytes.
In Example B the space (10 bytes) allocated for the string is larger than the requirement for the string (5 bytes). The packed structure is padded out with 5 nulls (\x00).
In Example 3 there is insufficient space (only 2 bytes) allocated for the string of 5 bytes. An exception (error) isn't raised; rather the string is truncated to fit the allocated space as defined by the format string.
Here are a couple of general examples of ustruct formatting strings:
Example A '<if3s' ⇒ small-endian byte order, a 4-byte signed integer, followed by a 4-byte float, followed by a 3-character string. Example B '@3L2dH' ⇒ native byte order, 3 x 4-byte unsigned integers, followed by 2 x 8-byte floats, followed by a 2-byte unsigned integer.
Note the use of prefixes in Example 2 that indicate more than one of the specified data types will follow.
It is important not to confuse signed and unsigned integers. If a packed unsigned integer is unpacked as a signed integer (and vice versa) there will be unexpected results.
Attempting to pack an integer that is too large for the defined binary field will not cause an error[4] . Instead the integer will be packed into the defined space with the most significant bytes (MSB) that don't fit being dropped.
calcsize() Method
Often it is necessary to know how many bytes the packed data will occupy from a given format string. This becomes important with the use of the pack_into() method that is described with examples in the next section.
The calcsize() method is provided for this purpose. It is simple to use:
Syntax: ustruct.calcsize(format) Where: format : is a ustruct format string. Examples: ustruct.calcsize('<if3s') ⇒ 11 ustruct.calcsize('@3L2dH') ⇒ 34
Packing and Unpacking Data
This module is all about packing numerical and string data into an efficient binary format and correspondingly being able to unpack the binary structure back to the required native format.
The packed binary structure is returned as a bytes object.
The ustruct module provides two paired pack/unpack methods for achieving just that; pack() / unpack() and pack_into() / unpack_from().
pack() and unpack() Methods
These two methods are the simplest of the pack/unpack combinations.
Syntax: ustruct.pack(format, v1, v2...) Returns a bytes object with values v1, v2... packed using the format string format. ustruct.unpack(format, buffer) Returns a tuple containing the values that are unpacked from buffer according to the format string format. Example: import ustruct Buffer = ustruct.pack('@Lf', 123456789, 3.14) print(Buffer) ⇒ b'\x15\xcd[\x07\xc3\xf5H@' Data = ustruct.unpack('@Lf', Buffer) print(Data) ⇒ (123456789, 3.14) print(Data[0]) ⇒ 123456789 print(Data[1]) ⇒ 3.14
Copy the following example into the Mu Editor and flash to the micro:bit. Examine the results in the REPL.
Example 1
# Demonstrate pack() & unpack() methods
# from the ustruct module.
import ustruct
# Some data to pack
Int = -125
Str = 'Hello'
print('Int =', Int, 'Str =', Str)
# Format string to pack the data.
format = 'i5s'
size = ustruct.calcsize(format)
print('Packed data size =', size, 'bytes')
# Pack the data
Buffer = ustruct.pack(format, Int, Str)
print('Packed data:')
print(Buffer)
# Unpack the data
Data = ustruct.unpack(format, Buffer)
print('Unpacked data:')
print(Data)
Results:
Int = -125 Str = Hello Packed data size = 9 bytes Packed data: b'\x83\xff\xff\xffHello' Unpacked data: (-125, b'Hello')
The packed bytes object has two components: the integer \x83\xff\xff\xf and the string Hello. The i part of the format string specifies a 4-byte integer. Since the value -125 (hexadecimal x83) only uses one storage byte the remaining three allocated spots are filled with nulls (\xff).
packinto() and unpack_from() Methods
These two methods are much more powerful. They allow multiple 'structs' (records) to be packed or unpacked into or from a buffer. Any record can be accessed, updated or unpacked in any order. Additional records can be appended to the end of the last record in the buffer providing the buffer is large enough to accommodate them.
An important concept is the record offset. The offset can be defined as the number of bytes from the beginning of the buffer to the first byte of the record of interest. The offset is 0 indexed i.e. the first byte in the buffer is at offset = 0.
Consider the following data struct (record) in pseudocode to describe a book:
Book: Title: str[25]; Author: str[25]; Published: integer; Pages: integer; Fiction: [0, 1]; END Book
Title and Author are strings with maximum length 25. Published is the publishing year of the current edition, Pages is the total page count. The field Fiction is boolean; True or False. Since MicroPython (unlike Python) does not have a ustruct typecode for boolean the simplest way to specify the value of this field is using 0 for False or 1 for True. When unpacked it can be converted to a boolean with the bool() function.
Taking the above example a Book record can be represented by the format string '25s25siiB'. If this string is passed to the ustruct.calcsize() method we learn that each book record packs into 61 bytes.
ustruct.calcsize('25s25siiB') ⇒ 61
Knowing the number of bytes per Book record it is now possible to calculate the offset of each record. The first Book will have an offset = 0, the second an offset = 49, the third an offset = 98 and so.
Book 1 Book 2 Book 3 0 61 122
With an understanding of offset the syntax of the two methods can be presented.
Syntax struct.pack_into(format, buffer, offset, v1, v2, ...) Where: Values v1, v2, ... are packed into buffer according to the string format starting at position offset. The parameter offset is not optional and is required. Example: import ustruct format = '2h' size = ustruct.calcsize(format) buffer = bytearray(2 * size) offset = 0 ustruct.pack_into(format, buffer, offset, 2, 5) offset += size ustruct.pack_into(format, buffer, offset, 12, 15) print(buffer) offset += size ustruct.pack_into(format, buffer, offset + 2 * size, 12, 15) ⇒ bytearray(b'\x02\x00\x05\x00\x0c\x00\x0f\x00') ⇒ ValueError: buffer too small
The buffer used to pack the values must be:
- Writable and
- Predeclared and
- Of sufficient size to contain all data that is packed
The buffer is usually instantiated as a bytearray[5] object. The size of the buffer required is easily calculated in advance from the format string and the number of records or structs that will be packed. The example above demonstrates that in practice.
In the above example:
- The format string is declared. Each packed record will consist of two short integers (each two bytes in size)
- The size of each packed record is calculated.
- A buffer to hold two packed records is declared.
- The offset for the first record by definition will be 0.
- The first record is packed with values 2 and 5
- The offset for the second record is calculated and the values 12 and 15 are packed into buffer commencing at the end of the first record.
- The bytearray buffer now shows the four packed short integers. Note that since each of the four packed values only require a single byte of storage space the second byte is packed with a null (\x00).
- Attempting to pack a third record into the buffer which has only been declared of sufficient size for two records will throw a ValueError exception (error).
Buffers that have been packed by the pack_into() method are unpacked with the unpack_from() method.
Syntax struct.unpack_from(format, buffer, offset=0) Where: The buffer is unpacked starting from position offset according to the string format. The unpacked data is returned as a tuple. The parameter offset is optional and by default is 0. Example: # This example uses the packed buffer from the ustruct.pack_into() example above. import ustruct format = '2h' buffer = bytearray(b'\x02\x00\x05\x00\x0c\x00\x0f\x00') data = ustruct.unpack_from(format, buffer) print(data) offset = ustruct.calcsize(format) data = ustruct.unpack_from(format, buffer, offset) print(data) ⇒ (2, 5) ⇒ (12, 15)
This example is straight forward:
- The format string is declared. Each packed record consists of two short integers (each two bytes in size)
- The packed buffer (from the ustruct.pack_into() example above) is declared.
- The offset for the first record by definition will be 0. This does not need to be explicitly passed to the unpack_from() method.
- The first record is unpacked with values 2 and 5 returned as a tuple.
- The size of a packed record is calculated which gives the offset to the second record.
- The second record is unpacked with the values 12 and 15 returned as a tuple.
The next example is much more comprehensive. It packs and unpacks Book records using the structure from above when the offset concept was discussed.
Example 2
# Demonstrate the use of pack_into()
# and unpack_from() methods of the
# ustruct module.
# This program will pack
# and unpack 'Book' records.
# The 'Book' record struct is:
# Book:
# Title: str[25];
# Author: str[25];
# Published: integer;
# Pages: integer;
# Fiction: [0, 1];
# END Book
import ustruct
# Get size of each 'Book' record.
format = ('25s25siiB')
size = ustruct.calcsize(format)
print('Book packed size:', size)
# 3 x 'Book' records
Titles = ('Building Android Apps',
'Slow Cooker',
'Final Act')
Authors = ('Mike McGrath',
'Sara Lewis',
'J.M. Gregson')
Published = (2013, 2007, 2016)
Pages = (192, 128, 198)
Fiction = (0, 0, 1)
# Pack the three 'Book' records
# into the buffer.
buffer = bytearray(3 * size)
for i in range(3):
offset = i * size
ustruct.pack_into(format, buffer, offset,
Titles[i],
Authors[i],
Published[i],
Pages[i],
Fiction[i])
# Read back record for the second 'Book'.
offset = size * 1 # Second book offset
book = ustruct.unpack_from(format, buffer, offset)
# 'Title' and 'Author' are bytes objects
# so must convert to strings
# with decode() before printing.
print('Title:', book[0].decode())
print('Author:', book[1].decode())
print('Published:', book[2])
print('Pages:', book[3])
print('Fiction:', bool(book[4]))
Output:
Book packed size: 61 Title: Slow Cooker
This example has three Book records (stored as a series of lists) that are packed into a bytearray buffer. Then the second record is unpacked and displayed.
Note that strings (Title and Author) are unpacked as bytes objects. The decode() function is used to convert them back to strings. The bool() function is used to convert fiction from an integer to a boolean.