MicroPython 'ustruct' Module

Contents

Introduction

Traditionally the Python module struct is used to convert between Python data structures and C structs. The MicroPython equivalent is the module ustruct which implements nearly all of the methods and functions found in the Python module[1].

The ustruct module is simple and easy to use. It exposes five methods:

  • ustruct.calcsize()
  • ustruct.pack()
  • ustruct.unpack()
  • ustruct.pack_into()
  • ustruct.unpack_from()

It is necessary to use an import statement, for example:

import ustruct

before MicroPython programs can access the module's methods.

In MicroPython, import ustruct and import struct are equivalent and can be used interchangeably. However, for consistency this tutorial will use the module name ustruct. The ‘u’ in ustruct symbolizes the ‘Micro’ in MicroPython.

While many microcontroller programmers may never have cause to use this module, the ustruct module does have its uses. Complex record structures can be defined and used to pack data into a very memory efficient binary format. Using ustruct formatted buffers (using bytearrays) are a very easy way to pack or unpack data records in a random-access manner.

Uses include (but not limited to!) reading and writing binary log files, reformatting binary data from a sensor and buffering data to go to an output device.

ustruct Format Codes

The ustruct module packs (and unpacks) numerical and string values into (and from) binary data. The structure of this binary data is defined by format strings composed of standardised codes.

Byte Order Codes

The first character of the format string determines the order in which the bytes are packed. There are two possibilities: little-endian and big-endian.

A big-endian system stores the most significant byte (MSB) of the data at the smallest memory address and the least significant byte (LSB) at the largest.

A small-endian system operates in reverse with the LSB stored at the smallest address and the MSB stored at the highest address. An example best explains this:


Consider the base10 (decimal) value: 123456789
It has the hexadecimal value: 0x075BCD15
This can be broken down into 4 x bytes: 07, 5B, CD, 15
                                       MSB → → → → → LSB

The order that these four bytes are stored in memory
depends on whether the ordering is little-endian
or big-endian.

Memory Location: x01 x02 x03 x04 
     big-endian:  07  5B  CD  15
  little-endian:  15  CD  5B  07
          

The MicroPython usys module[2] provides the byteorder property that reports whether the host system is big-endian or little endian. This property returns a string 'big' or 'little' respectively.


Example 1
# Report on whether this system is
# big-endian or small-endian.

import usys
s = usys.byteorder + '-endian'
print('micro:bit byte order is:', s)

Output:
micro:bit byte order is: little-endian
          

Table 1 lists and describes the byte order codes.

Table 1: First character of ustruct formatting string
Character Byte Order
@ native
< little-endian
> big-endian
! network (always big-endian)

The byte order code may be omitted from the format string in which case it defaults to the @ (native) code. The byte order of the current system is then used.

Data Type Codes

Following the byte order code (if specified) are any number of data type formatting characters. These describe in detail the structure of the binary data to be packed or unpacked. Table 2 lists and describes every code available with the micro:bit's version of MicroPython.

The packed size is given in bytes.

Table 2: ustruct formatting string data type characters
Character C Type Python Type Packed Size
b signed char integer 1
B unsigned char integer 1
h short integer 2
H unsigned short integer 2
i int integer 4
I unsigned int integer 4
l long integer 4
L unsigned long integer 4
q long long integer 8
Q unsigned long long integer 8
f float float 4
d double float 8
s char[] bytes[3]

The 's' formatting string code provides a way to pack strings into a binary structure. The size to be allocated in the binary packed structure is specified with a prefixed integer. If allocated size is larger than the size of the string the binary data will be padded out with nulls i.e. \x00.


's' Example A
Format string:  '<5s'
String to pack: 'Hello'
Packed string:  b'Hello'

's' Example B
Format string:  '<10s'
String to pack: 'Hello'
Packed string:  b'Hello\x00\x00\x00\x00\x00'

's' Example C
Format string:  '<2s'
String to pack: 'Hello'
Packed string:  b'He'
          

All three examples force the use of little-endian with the '<' byte order code.

Example A shows the string packed to its exact size i.e. 5 bytes.

In Example B the space (10 bytes) allocated for the string is larger than the requirement for the string (5 bytes). The packed structure is padded out with 5 nulls (\x00).

In Example 3 there is insufficient space (only 2 bytes) allocated for the string of 5 bytes. An exception (error) isn't raised; rather the string is truncated to fit the allocated space as defined by the format string.

Here are a couple of general examples of ustruct formatting strings:


Example A
'<if3s'
⇒ small-endian byte order,
  a 4-byte signed integer,
  followed by a 4-byte float,
  followed by a 3-character string.

Example B
'@3L2dH'
⇒ native byte order,
  3 x 4-byte unsigned integers,
  followed by 2 x 8-byte floats,
  followed by a 2-byte unsigned integer.
          

Note the use of prefixes in Example 2 that indicate more than one of the specified data types will follow.

It is important not to confuse signed and unsigned integers. If a packed unsigned integer is unpacked as a signed integer (and vice versa) there will be unexpected results.

Attempting to pack an integer that is too large for the defined binary field will not cause an error[4] . Instead the integer will be packed into the defined space with the most significant bytes (MSB) that don't fit being dropped.

calcsize() Method

Often it is necessary to know how many bytes the packed data will occupy from a given format string. This becomes important with the use of the pack_into() method that is described with examples in the next section.

The calcsize() method is provided for this purpose. It is simple to use:


Syntax:
ustruct.calcsize(format)

Where:
  format : is a ustruct format string.

Examples:
ustruct.calcsize('<if3s') ⇒ 11
ustruct.calcsize('@3L2dH') ⇒ 34
          

Packing and Unpacking Data

This module is all about packing numerical and string data into an efficient binary format and correspondingly being able to unpack the binary structure back to the required native format.

The packed binary structure is returned as a bytes object.

The ustruct module provides two paired pack/unpack methods for achieving just that; pack() / unpack() and pack_into() / unpack_from().

pack() and unpack() Methods

These two methods are the simplest of the pack/unpack combinations.


Syntax:
ustruct.pack(format, v1, v2...)
Returns a bytes object with values v1, v2...
packed using the format string format.

ustruct.unpack(format, buffer)
Returns a tuple containing the values that
are unpacked from buffer according to the
format string format.

Example:
import ustruct
Buffer = ustruct.pack('@Lf', 123456789, 3.14)
print(Buffer)
⇒ b'\x15\xcd[\x07\xc3\xf5H@'

Data = ustruct.unpack('@Lf', Buffer)
print(Data)    ⇒ (123456789, 3.14)
print(Data[0]) ⇒ 123456789
print(Data[1]) ⇒ 3.14
          

Copy the following example into the Mu Editor and flash to the micro:bit. Examine the results in the REPL.

Example 1

# Demonstrate pack() & unpack() methods
# from the ustruct module.

import ustruct

# Some data to pack
Int = -125
Str = 'Hello'
print('Int =', Int, 'Str =', Str)

# Format string to pack the data.
format = 'i5s'
size = ustruct.calcsize(format)
print('Packed data size =', size, 'bytes')

# Pack the data
Buffer = ustruct.pack(format, Int, Str)
print('Packed data:')
print(Buffer)

# Unpack the data
Data = ustruct.unpack(format, Buffer)
print('Unpacked data:')
print(Data)
          
Results:

Int = -125 Str = Hello
Packed data size = 9 bytes
Packed data:
b'\x83\xff\xff\xffHello'
Unpacked data:
(-125, b'Hello')
          

The packed bytes object has two components: the integer \x83\xff\xff\xf and the string Hello. The i part of the format string specifies a 4-byte integer. Since the value -125 (hexadecimal x83) only uses one storage byte the remaining three allocated spots are filled with nulls (\xff).

packinto() and unpack_from() Methods

These two methods are much more powerful. They allow multiple 'structs' (records) to be packed or unpacked into or from a buffer. Any record can be accessed, updated or unpacked in any order. Additional records can be appended to the end of the last record in the buffer providing the buffer is large enough to accommodate them.

An important concept is the record offset. The offset can be defined as the number of bytes from the beginning of the buffer to the first byte of the record of interest. The offset is 0 indexed i.e. the first byte in the buffer is at offset = 0.

Consider the following data struct (record) in pseudocode to describe a book:


Book:
  Title: str[25];
  Author: str[25];
  Published: integer;
  Pages: integer;
  Fiction: [0, 1];
END Book
          

Title and Author are strings with maximum length 25. Published is the publishing year of the current edition, Pages is the total page count. The field Fiction is boolean; True or False. Since MicroPython (unlike Python) does not have a ustruct typecode for boolean the simplest way to specify the value of this field is using 0 for False or 1 for True. When unpacked it can be converted to a boolean with the bool() function.

Taking the above example a Book record can be represented by the format string '25s25siiB'. If this string is passed to the ustruct.calcsize() method we learn that each book record packs into 61 bytes.


ustruct.calcsize('25s25siiB') ⇒ 61
          

Knowing the number of bytes per Book record it is now possible to calculate the offset of each record. The first Book will have an offset = 0, the second an offset = 49, the third an offset = 98 and so.

  Book 1    Book 2     Book 3  
0        61        122
          

With an understanding of offset the syntax of the two methods can be presented.


Syntax
struct.pack_into(format, buffer, offset, v1, v2, ...)

Where:
    Values v1, v2, ... are packed into buffer
    according to the string format starting at
    position offset.

    The parameter offset is not optional
    and is required.

Example:
import ustruct
format = '2h'
size = ustruct.calcsize(format)
buffer = bytearray(2 * size)
offset = 0
ustruct.pack_into(format, buffer, offset, 2, 5)
offset += size
ustruct.pack_into(format, buffer, offset, 12, 15)
print(buffer)
offset += size
ustruct.pack_into(format, buffer, offset + 2 * size, 12, 15)
⇒ bytearray(b'\x02\x00\x05\x00\x0c\x00\x0f\x00')
⇒ ValueError: buffer too small
          

The buffer used to pack the values must be:

  1. Writable and
  2. Predeclared and
  3. Of sufficient size to contain all data that is packed

The buffer is usually instantiated as a bytearray[5] object. The size of the buffer required is easily calculated in advance from the format string and the number of records or structs that will be packed. The example above demonstrates that in practice.

In the above example:

  • The format string is declared. Each packed record will consist of two short integers (each two bytes in size)
  • The size of each packed record is calculated.
  • A buffer to hold two packed records is declared.
  • The offset for the first record by definition will be 0.
  • The first record is packed with values 2 and 5
  • The offset for the second record is calculated and the values 12 and 15 are packed into buffer commencing at the end of the first record.
  • The bytearray buffer now shows the four packed short integers. Note that since each of the four packed values only require a single byte of storage space the second byte is packed with a null (\x00).
  • Attempting to pack a third record into the buffer which has only been declared of sufficient size for two records will throw a ValueError exception (error).

Buffers that have been packed by the pack_into() method are unpacked with the unpack_from() method.


Syntax
struct.unpack_from(format, buffer, offset=0)

Where:
    The buffer is unpacked starting from
    position offset according to the 
    string format. The unpacked data is
    returned as a tuple.

    The parameter offset is optional and
    by default is 0.


Example:
# This example uses the packed buffer from
the ustruct.pack_into() example above.
import ustruct
format = '2h'
buffer = bytearray(b'\x02\x00\x05\x00\x0c\x00\x0f\x00')
data = ustruct.unpack_from(format, buffer)
print(data)
offset = ustruct.calcsize(format)
data = ustruct.unpack_from(format, buffer, offset)
print(data)
⇒ (2, 5)
⇒ (12, 15)
          

This example is straight forward:

  • The format string is declared. Each packed record consists of two short integers (each two bytes in size)
  • The packed buffer (from the ustruct.pack_into() example above) is declared.
  • The offset for the first record by definition will be 0. This does not need to be explicitly passed to the unpack_from() method.
  • The first record is unpacked with values 2 and 5 returned as a tuple.
  • The size of a packed record is calculated which gives the offset to the second record.
  • The second record is unpacked with the values 12 and 15 returned as a tuple.

The next example is much more comprehensive. It packs and unpacks Book records using the structure from above when the offset concept was discussed.

Example 2

# Demonstrate the use of pack_into()
# and unpack_from() methods of the
# ustruct module.

# This program will pack
# and unpack 'Book' records.
# The 'Book' record struct is:
#   Book:
#    Title: str[25];
#    Author: str[25];
#    Published: integer;
#    Pages: integer;
#    Fiction: [0, 1];
#   END Book

import ustruct

# Get size of each 'Book' record.
format = ('25s25siiB')
size = ustruct.calcsize(format)
print('Book packed size:', size)

# 3 x 'Book' records
Titles = ('Building Android Apps',
          'Slow Cooker',
          'Final Act')
Authors = ('Mike McGrath',
           'Sara Lewis',
           'J.M. Gregson')
Published = (2013, 2007, 2016)
Pages = (192, 128, 198)
Fiction = (0, 0, 1)

# Pack the three 'Book' records
# into the buffer.
buffer = bytearray(3 * size)
for i in range(3):
    offset = i * size
    ustruct.pack_into(format, buffer, offset,
                      Titles[i],
                      Authors[i],
                      Published[i],
                      Pages[i],
                      Fiction[i])

# Read back record for the second 'Book'.
offset = size * 1 # Second book offset
book = ustruct.unpack_from(format, buffer, offset)
# 'Title' and 'Author' are bytes objects
# so must convert to strings
# with decode() before printing.
print('Title:', book[0].decode())
print('Author:', book[1].decode())
print('Published:', book[2])
print('Pages:', book[3])
print('Fiction:', bool(book[4]))
          
Output:

Book packed size: 61
Title: Slow Cooker
          

This example has three Book records (stored as a series of lists) that are packed into a bytearray buffer. Then the second record is unpacked and displayed.

Note that strings (Title and Author) are unpacked as bytes objects. The decode() function is used to convert them back to strings. The bool() function is used to convert fiction from an integer to a boolean.