TreeHaus

TreeHaus is a lightweight zero-dependency, pure python3 library for persistent tree-based indexes.

treehaus.TreeHaus can be used to create and open files containing a set of dict-like TreeHaus treebased indices represented by treehaus.Index instances.

Limitations

TreeHaus aims to provide tree-based storage with simplicity and robustness. Look elsewhere if high performance is a high priority.

Calls to an individual store instance and associated index instances should not be invoked concurrently from multiple threads.

Multiple TreeHaus stores can be opened concurrently on the same file, but only one instance should be opened for writing.

When a TreeHaus store is opened it will see only the latest committed changes.

TreeHaus uses a python3 serialization protocol. It will not be possible to read a TreeHaus file using other programming languages (including earlier versions of python).

Keys and values may be of any type supported by the python3 pickle protocol. However keys within each index should use the same type and be comparable with the python > and < operators.

Example


import os.path
from treehaus import TreeHaus

path = "simple.th"

if os.path.exists(path):
    os.unlink(path)

TreeHaus.create(path)

with TreeHaus.open(path) as th1:
    books = th1["books"]
    books["9780140449136"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
    books["9781853260629"] = { "title":"War and Peace", "author":"Leo Tolstoy" }

with TreeHaus.open(path,readOnly=True) as th2:
    books = th2["books"]
    for (k,v) in books:
        print(str(k) + " -> " + str(v))

    # prints
    # 9780140449136 -> {'title': 'Crime and Punishment', 'author': 'Fyodor Mikhailovich Dostoyevsky'}
    # 9781853260629 -> {'title': 'War and Peace', 'author': 'Leo Tolstoy'}

TreeHaus is a data store consisting of indexes persisted to a file. Use TreeHaus.create to create a store and then TreeHaus.create to open and return a TreeHaus instance

static TreeHaus.create(path, initial_nodesize=10)

Create an empty TreeHaus store

Parameters:path (str) – the path of the file to create, which should not already exist
Keyword Arguments:
 initial_nodesize (int) – integer >= 2, define the size of btree internal nodes
Raises:FileExistsError – raised if the file already exists

A way you might use me is

>>> path = "data.th"
>>> TreeHaus.create(path)
static TreeHaus.open(path, readOnly=False, openAtUpdate=None)

Open a TreeHaus store

Parameters:

path (str) – the path of the file to open, which point to a created TreeHaus file

Keyword Arguments:
 
  • readOnly (bool) – whether file should be opened in read-only mode (True) or writable mode (False)
  • openAtUpdate (int) – open the file in read-only mode at particular update number
Returns:

a treehaus.TreeHaus instance allowing the data in the file to be accessed

Raises:

FileNotExistsError – raised if the file does not exists

A way you might use me is

>>> path = "data.th"
>>> th = TreeHaus.open(path)

TreeHaus methods:

TreeHaus.getIndex(indexName)

Obtain an index within the store, creating if it does not already exist

Parameters:indexName (str) – the name of the index to obtain
Returns:a treehaus.Index instance allowing key-value pairs to be written and read
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> path = "data.th"
>>> th = TreeHaus.open(path)
>>> books = th.getIndex("books")
TreeHaus.getIndices()

Get the names and numbers of stored keys of all indices in the store

Returns:a list of strings containing the names of all indices
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> path = "~/data.th"
>>> th = TreeHaus.open(path)
>>> th.getIndices()
[('books', 107),('records', 33)]
TreeHaus.removeIndex(indexName)

Remove an index from the store

Parameters:indexName (str) – the name of the index to remove
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> path = "~/data.th"
>>> th = TreeHaus.open(path)
>>> th.removeIndex("records")
TreeHaus.getUpdates()

Get an iterator over the set of updates that were successfully committed to the store

Returns:iterator returning (updateNumber,timestamp,indexNameCardinalityMap,metadata) tuples
Raises:treehaus.StoreClosedException – raised if the store has been closed

Most recent updates are returned first

A way you might use me is:

>>> path = "~/data.th"
>>> th = TreeHaus.open(path)
>>> th.getUpdates().next()
(23, 1558273679, {'books': 107, 'records': 33}, 'add latest titles')
TreeHaus.commit(metadata='')

Commit all outstanding changes in the store to file

Returns:an updateNumber for the commit.
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> path = "~/data.th"
>>> th = TreeHaus.open(path)
>>> th.getIndex("books")["9781853260629"]={"title":"War and Peace","author":"Leo Tolstoy"}
>>> th.commit()
24
TreeHaus.rollback()

Cancel all outstanding changes made to the store

Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> path = "~/data.th"
>>> th = TreeHaus.open(path)
>>> th.getIndex("books")["9781853260629"]={"title":"War and Peace","author":"David Tolstoy"}
>>> th.rollback() # mistake - should be Leo not David - do not persist this change
TreeHaus.close()

close the instance if it is not already closed, any in-progress updates will be committed

after close is called any opened indices or iterators can no longer be used

A way you might use me is:

>>> path = "~/data.th"
>>> with TreeHaus.open(path) as th:
>>>     ... read and write to indexes
>>> ... after with block ends, TreeHaus.close() will be called
static TreeHaus.getVersion()

Get the TreeHaus version

Returns:Version number of TreeHaus as a string in the format “VMajor.VMinor”

A way you might use me is:

>>> TreeHaus.version()
"0.1"

Index

a TreeHaus Index is a dict-like container within a TreeHaus store and returned from a call to treehaus.TreeHaus.getIndex

Methods to retrieve information:

Index.__getitem__(key)

Get an item from this index

Parameters:key – the key value to retrieve
Returns:the value of the key
Raises:KeyError – (if the key does not exist in the index)

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> book_details = books["0140449132"]
Index.__contains__(key)

Test if a key exists in the index

Parameters:key – the key value to test
Returns:True if the key is found in the index, or False otherwise
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> "0140449132" in books
True
Index.get(key, defaultvalue=None)

Get an item from this index

Parameters:
  • key – the key value to retrieve
  • defaultvalue – a value to return if the key was not found
Returns:

the value of the key if found in the index, or the defaultvalue otherwise

Raises:

treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> book_details = books.get("0140449132",{"title":"unknown","author":"unknown"})
Index.traverse(start=None, stop=None)

Iterate over the index in key order

Keyword Arguments:
 
  • start – start the iterator from this key value
  • stop – end the iterator at this key value
Returns:

an iterator which returns (key,value) pairs

Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.IndexModifiedException – raised during iteration if the index was modified after the iterator was opened

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> for (isbn,details) in books:
>>>     print(str(isbn))
Index.rtraverse(start=None, stop=None)

Iterate over the index in reverse order

Keyword Arguments:
 
  • start – start the iterator from this key value
  • stop – end the iterator at this key value
Returns:

an iterator which returns (key,value) pairs in reverse key order

Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.IndexModifiedException – raised during iteration if the index was modified after the iterator was opened

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> for (isbn,details) in books.rtraverse():
>>>     print(str(isbn))
Index.history(key)

Return a history of the values assigned to a key, most recently assigned value first

Parameters:

key – the key of interest

Returns:

an iterator which returns (updateNumber,value) pairs indicating that the key was assigned that value at that updateNumber

Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.IndexModifiedException – raised during iteration if the index was modified after the iterator was opened

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> for (updateNumber,value) in books.history("0140449132"):
>>>     print(str((updateNumber,value)))
(45,{ "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" })
(23,{ "title": "Crimes and Punishment", "author":"Dave Dostoyevsky" })

Methods to update the index:

Index.__setitem__(key, value)

Add or modify an item in this index

Parameters:
  • key – the key value to add
  • value – the associated value to add
Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.ReadOnlyException – (if the store is opened read-only)

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> books["0140449132"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
Index.__delitem__(key)

Delete an item from this index

Parameters:

key – the key value to delete

Returns:

the value of the key that was removed

Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.ReadOnlyException – (if the store is opened read-only)
  • KeyError – (if the key does not exist in the index)

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> removed_book = del books["0140449132"]
Index.pop(key, defaultvalue=None)

Remove and return a value for a specified key

Parameters:
  • key – the key to retrieve and remove
  • defaultvalue – the value to return if the key was not found
Returns:

if the key already existed in the index its value is returned, otherwise defaultvalue is returned

Raises:
  • treehaus.StoreClosedException – raised if the store has been closed
  • treehaus.ReadOnlyException – (if the store is opened read-only)

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> removed_book = books.pop("0140449132")
Index.update(other)

Update the index with multiple (key,value) pairs

Parameters:other – dict or iterable returning (key,value) pairs
Raises:treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> newtitles = {}
>>> newtitles["9780140449136"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
>>> newtitles["9781853260629"] = { "title":"War and Peace", "author":"Leo Tolstoy" }
>>> books.update(newtitles)
>>> books["9781853260629"]
{ "title":"War and Peace", "author":"Leo Tolstoy" }
>>> books["9780140449136"]
{ "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
>>> th.commit()
54
Index.clear()

Remove all keys from the index

Raises:
  • treehaus.ReadOnlyException – (if the store is opened read-only)
  • treehaus.StoreClosedException – raised if the store has been closed

A way you might use me is:

>>> from treehaus import TreeHaus
>>> th = TreeHaus.open("data.th")
>>> books = th.getIndex("books")
>>> books.clear()
>>> len(books)
0