tialloc : a small memory allocator

Memory allocators have an overhead. Usually on the order of around 12bytes. This overhead stores the size of the allocation and other information the allocation needs to maintain and free the allocation. Memory allocators also often allocate more memory than you request, either for architecture specific reasons (alignment) or as part of the allocation method (see here).

This is usually fine. But it’s a problem if you’re doing a lot of very small allocations (say less than 100bytes) and you want to make efficient use of memory. For this reason I have started developing an efficient memory allocator for small allocations (1byte to 100odd bytes). Whereas a traditional allocator would have an overhead of a few hundred percent for a 1 byte allocation this allocator should have an overhead of around 2%. There are of course downsides, one being that the memory will in many cases not be word aligned. This may case performance issues on some platforms. The allocator is also currently designed only to work with 32bit systems.

The allocator works by grouping all allocations of a particular size in the same block, as such the allocator only needs to store the allocation size once for each block.

The diagram bellow shows an allocation block. All allocation blocks are 256bytes long. The allocation blocks are also aligned to 256 boundaries in memory (this is important).

Diagram 1: A block containing 1 byte allocations

Diagram 1 shows a newly allocated block for the storage of 1 byte allocations. The prev_free_block and next_free_block pointers are used to chain blocks together in memory. They point to other free blocks which have unallocated cells. f_s and f_e (free_chain_start and free_chain_end in the code below) point to a linked list of free cells within this block. At initialisation all cells are unallocated. And blocks are chained together. This chaining, also allows for non-contiguous allocation of blocks (again bearing in mind they must be 256byte aligned).

Diagram 2: Chaining of blocks

A pointer (free_block_ptr) is maintained, this points to the last block in the chain. A chain and free_block_ptr is maintained for every size our allocator is working on. Each block contains the allocation size (a in the diagrams).

The chaining of cells within the block varies based on allocation size. Below you can see the chaining for allocations of size 4.

Diagram 3: Allocations of size 4

So at initialisation we have multiple chains of blocks, one for each allocation size. Each chain pointed to by a free_block_ptr.

Allocation and freeing memory proceeds are follows:


First, get the free_block_ptr for the allocation size requested. Then if…

There is only one free cell in the free_block_ptr block:
If there is only one free cell in this block, this is the cell that will be returned. The block is unlinked from the chain (prev_free_block and next_free_block blocks are linked together). free_block_ptr is set to the value of prev_free_block.

free_chain_start and free_chain_end are set to 0xFF to indicate there are no free cells in this block.

There’s more than one free cell in the free_block_ptr block:
Get free_chain_end (f_e in the diagrams above) return this cell as the allocation. Set free_chain_end to be the value pointed to by the cell free_chain_end (i.e. data[free_chain_end]).

There are no free cells in the free_block_ptr block:
Initialise more blocks of this size (generally using malloc). Set free_block_ptr to the last of the blocks in this chain. Perform initialisation again.

Freeing memory

Optionally add a check here to see if the memory was allocated by tialloc, tialloc and malloc can co-exist on many systems. Where malloc_size can be used to determine if a block was allocated using malloc (see here.

1. Locate the start of the block which contained the address we are freeing (addr). This is possible because blocks are aligned to 256 byte boundaries. So addr – addr%256 is the start of the block. We can now access all block data for the block in which this address is contained. Then if…

There are free cells in the block:
There are already free cells in the block, set the freed item base byte to point to free_chain_end (f_e). Set free_chain_end to the freed items data item value.

There are no free cells in the current block:
Set free_chain_start and free_chain_end (f_s and f_e) to the data index used by addr. Set the next_free_block pointer of the block pointed to by free_block_ptr to this block. Set free_block_ptr to this block. This adds the block on to the free blocks chain.


As far as the algorithm description goes that’s how things work, these are some probably some implementation issues I’ve not mentioned. Currently my implementation does free whole blocks once all the cells are free within that block. It would be possible to do this, either by periodically garbage collecting, or checking all items within the allocation unit when a block is freed.

It’s worth noting that this method probably has significant performance implications. Particularly as the data is non-word aligned and many 64bit architectures don’t like this. I’ve also opted to make this a 32bit implementation only are present, this is to avoid the use of 64bit pointers which would almost double my overhead.

The code is currently implemented as a C++ class, this implementation was designed to test the algorithm and I have not yet used it in anger.

This code is NOT finished, and it fairly scrappy. Use it at your own risk. You may use it under a GPL3 style license. If you feel it’s useful and you want to use it under a different license please contain me at new at sgenomics dot org.

Here’s tialloc.h:

#include <stdint.h>
#include <malloc/malloc.h>
#include <iostream>
#include <vector>

using namespace std;

struct small_block {
  small_block *next_free_block;
  small_block *prev_free_block;
   uint8_t         free_chain_start;   // 0xFF means unallocated, otherwise index in to data.
   uint8_t         free_chain_end;     // 0xFF means unallocated, otherwise index in to data.
   uint8_t         alloc_size;

   uint8_t         data[245];
} __attribute__((__packed__));

class tialloc {

  small_block *free_block_ptr[123]; // pointers to last_free_blocks
  int32_t alloc_size;


  tialloc() {
    for(int n=1;n<123;n++) {

  // Return true if this block was allocated by tialloc, otherwise false (malloc'd)
  bool is_tiallocated(void *addr) {
    if(((int32_t)addr)%4 == 0) {  // all malloc'd blocks will be word aligned.
      void *aligned_addr = (void *)(((int32_t)addr) - (int32_t)addr%4);
      long int size = malloc_size(aligned_addr); // ask malloc for it's size.
      if(size != 0) {
        // This block was malloc'd
        return false;

    return true;

  // get the start of the tialloc block from any pointer within it.
  small_block *get_tiblock_start(void *addr) {
    return (small_block *) ((int32_t)addr - (int32_t) addr%256);

  // from a pointer to a data item in a tialloc block, return the data item index
  uint8_t ptr_to_data_idx(void *addr) {

    small_block *block_addr = get_tiblock_start(addr);

    return ((int32_t)addr)-(int32_t)(block_addr->data);
    //return addr-block_addr-11; 

  bool any_free(small_block *addr) {

    if(addr == 0x0) return false; // nothing free!

    if((addr->free_chain_start == 0xFF) &amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp; (addr->free_chain_end == 0xFF)) return false;
    return true;

  void free(void *addr) {

    if(!is_tiallocated(addr)) {free(addr); return;}

    small_block *this_block = get_tiblock_start(addr);

    // are there existing free elements in this block?
    if(any_free(this_block)) {
      // there are existing free items

      //cout << "free: there are existing free items" << endl;
      uint8_t d_temp = this_block->free_chain_end;
      uint8_t free_item_idx = ptr_to_data_idx(addr);
      this_block->data[free_item_idx] = d_temp;

      this_block->free_chain_end = free_item_idx;
    } else {
      // there are no existing free items

      //cout << "free: there are no existing free items" << endl;
      small_block *fb_temp = free_block_ptr[this_block->alloc_size];

      free_block_ptr[this_block->alloc_size] = get_tiblock_start(addr);

      this_block->free_chain_start = ptr_to_data_idx(addr);
      this_block->free_chain_end   = ptr_to_data_idx(addr);

      fb_temp->next_free_block = this_block;

  void *alloc(int32_t n_alloc_size=1) {

    // are there any free items left?
    if(any_free(free_block_ptr[n_alloc_size])) {

      // only one item left?
      if(free_block_ptr[n_alloc_size]->free_chain_start == free_block_ptr[n_alloc_size]->free_chain_end) {
        cout << "alloc: only one item left" << endl;
        cout << "free_chain_start/end: " << (int) free_block_ptr[n_alloc_size]->free_chain_start << endl;
        int8_t ret_idx = free_block_ptr[n_alloc_size]->free_chain_end; // free_block_ptr->data[free_block_ptr->free_chain_end];
        free_block_ptr[n_alloc_size]->free_chain_start = 0xFF;
        free_block_ptr[n_alloc_size]->free_chain_end   = 0xFF;

        small_block *free_block_tmp = free_block_ptr[n_alloc_size]->prev_free_block;
        free_block_ptr[n_alloc_size]->prev_free_block  = 0;
        free_block_ptr[n_alloc_size]->next_free_block  = 0;

        void *retval = free_block_ptr[n_alloc_size]->data + ret_idx;
        free_block_ptr[n_alloc_size] = free_block_tmp;

        return retval;
      } else {
        // more than one item left.
        cout << "alloc: more than one item left" << endl;

        uint8_t ret_idx = free_block_ptr[n_alloc_size]->free_chain_end;
        cout << "ret_idx: " << (int) ret_idx << endl;
        cout << "free_block_ptr: " << free_block_ptr[n_alloc_size] << endl;

        free_block_ptr[n_alloc_size]->free_chain_end = free_block_ptr[n_alloc_size]->data[free_block_ptr[n_alloc_size]->free_chain_end];

        void *retval = (void *) (((int32_t)free_block_ptr[n_alloc_size]->data) + ret_idx);
        return retval;

    } else {

      // allocate more blocks here
      cout << "reached block limit, allocate more" << endl;
      return alloc(n_alloc_size);


  void initialise(int n_alloc_size = 1) {

    alloc_size = 1024; //1024*1024*10;
    // malloc a bunch of memory

    // memory should really only be local, class member for debugging.
    uint8_t *memory = (uint8_t *)malloc(alloc_size); // 10Mb

    cout << "allocated: " << (small_block *) memory << endl;


    for(;((int32_t)memory)%256 !=0;) {memory++; alloc_size--; cout << "padding" << endl;} // pad allocation to be a multiple of 256
    cout << "padding complete" << endl;

    small_block t;
    t.prev_free_block  = 0;
    t.next_free_block  = 0;
    t.alloc_size       = n_alloc_size;
    t.free_chain_start = 0;
    t.free_chain_end   = 244 - (244%n_alloc_size);
    for(int n=n_alloc_size;n<245;n=n+n_alloc_size) {
      t.data[n] = n-n_alloc_size;
    t.data[0] = 0xFF;
    cout << "built template" << endl;

    void *current_addr = memory;
    int n_limit = ((alloc_size/256)-1);
    cout << "n limit: " << n_limit << endl;
    for(int n=0;n<n_limit;n++) {
      // cout << "n: " << n << endl;
      t.prev_free_block = (small_block *)(((int32_t)current_addr)-256);
      t.next_free_block = (small_block *)(((int32_t)current_addr)+256);
      if(n==0) t.prev_free_block = 0;


      current_addr = (void *) ((int32_t) current_addr + 256);

    free_block_ptr[n_alloc_size] = (small_block *) ((int32_t) current_addr - 256);
    cout << "memory fill complete, initialisation complete" << endl;



Here’s a small driver program. Under 64bit systems you may need to force 32bit compilation. e.g: g++ tialloctest.cpp -m32. It’s also worth noting I’ve only tried this code under gcc on a Mac (Snow Leopard).

#include "tialloc.h"

int main() {

  cout << "size of small_block is: " << sizeof(small_block) << endl;

  tialloc m_allocator;

  char *i0 = (char *) m_allocator.alloc(1);

  cout << "alloc pointer i0 is: " << ((void *)i0) << endl;
  *i0 = 0xFE;
  char *i1 = (char *) m_allocator.alloc(1);
  char *i2 = (char *) m_allocator.alloc(1);
  char *i3 = (char *) m_allocator.alloc(1);

  *i1 = 0xFD;
  *i2 = 0xFE;
  *i3 = 0xFD;


  vector<uint8_t  > values;
  vector<uint8_t *> memorylocations;

  // test size 1 allocations:
  // fill
  for(int n=0;n<10000;n++) {
    uint8_t rand_val = rand()%256;

    uint8_t *i = (uint8_t *) m_allocator.alloc(1);
    *i = rand_val;


    // self validate
    uint8_t value = *(memorylocations[memorylocations.size()-1]);
    if(value != values[values.size()-1]) {
      cout << "self storage validation error, " << (int) value << " != " << (int) values[values.size()-1] << "    " << n << endl;
      int *i =0; *i = 1;
    value = *(memorylocations[memorylocations.size()-1]);

  // verify
  for(int n=0;n<values.size();n++) {
    uint8_t value = *(memorylocations[n]);
    if(n%1000 == 0) cout << "validated: " << n << endl;
    if(value != values[n]) cout << "storage validation error, " << (int) value << " != " << (int) values[n] << "    " << n << endl;

  // free
  for(int n=0;n<memorylocations.size();n++) {

cout << "4 bytes" << endl;

  // test random sized allocations
  vector<uint32_t  > values4;
  vector<uint32_t *> memorylocations4;
  // fill
  for(int n=0;n<10000;n++) {
    uint32_t rand_val = rand();

    uint32_t *i = (uint32_t *) m_allocator.alloc(4);
    *i = rand_val;


    // self validate
    uint32_t value = *(memorylocations4[memorylocations4.size()-1]);
    if(value != values4[values4.size()-1]) {
      cout << "self storage validation error, " << (int) value << " != " << (int) values4[values4.size()-1] << "    " << n << endl;
      int *i =0; *i = 1;
    value = *(memorylocations4[memorylocations4.size()-1]);

  // verify
  for(int n=0;n<values4.size();n++) {
    uint32_t value = *(memorylocations4[n]);
    if(n%1000 == 0) cout << "validated: " << n << endl;
    if(value != values4[n]) cout << "storage validation error, " << (int) value << " != " << (int) values4[n] << "    " << n << endl;

  // free
  for(int n=0;n<memorylocations4.size();n++) {