TreeHaus¶
TreeHaus is a lightweight zero-dependency, pure python3 library for persistent tree-based indexes.
treehaus.TreeHaus
can be used to create and open files containing a set of dict-like TreeHaus treebased indices represented by treehaus.Index
instances.
Limitations¶
TreeHaus aims to provide tree-based storage with simplicity and robustness. Look elsewhere if high performance is a high priority.
Calls to an individual store instance and associated index instances should not be invoked concurrently from multiple threads.
Multiple TreeHaus stores can be opened concurrently on the same file, but only one instance should be opened for writing.
When a TreeHaus store is opened it will see only the latest committed changes.
TreeHaus uses a python3 serialization protocol. It will not be possible to read a TreeHaus file using other programming languages (including earlier versions of python).
Keys and values may be of any type supported by the python3 pickle protocol. However keys within each index should use the same type and be comparable with the python > and < operators.
Example¶
import os.path
from treehaus import TreeHaus
path = "simple.th"
if os.path.exists(path):
os.unlink(path)
TreeHaus.create(path)
with TreeHaus.open(path) as th1:
books = th1["books"]
books["9780140449136"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
books["9781853260629"] = { "title":"War and Peace", "author":"Leo Tolstoy" }
with TreeHaus.open(path,readOnly=True) as th2:
books = th2["books"]
for (k,v) in books:
print(str(k) + " -> " + str(v))
# prints
# 9780140449136 -> {'title': 'Crime and Punishment', 'author': 'Fyodor Mikhailovich Dostoyevsky'}
# 9781853260629 -> {'title': 'War and Peace', 'author': 'Leo Tolstoy'}
TreeHaus is a data store consisting of indexes persisted to a file. Use TreeHaus.create to create a store and then TreeHaus.create to open and return a TreeHaus instance
-
static
TreeHaus.
create
(path, initial_nodesize=10)¶ Create an empty TreeHaus store
Parameters: path (str) – the path of the file to create, which should not already exist Keyword Arguments: initial_nodesize (int) – integer >= 2, define the size of btree internal nodes Raises: FileExistsError
– raised if the file already existsA way you might use me is
>>> path = "data.th" >>> TreeHaus.create(path)
-
static
TreeHaus.
open
(path, readOnly=False, openAtUpdate=None)¶ Open a TreeHaus store
Parameters: path (str) – the path of the file to open, which point to a created TreeHaus file
Keyword Arguments: - readOnly (bool) – whether file should be opened in read-only mode (True) or writable mode (False)
- openAtUpdate (int) – open the file in read-only mode at particular update number
Returns: a
treehaus.TreeHaus
instance allowing the data in the file to be accessedRaises: FileNotExistsError
– raised if the file does not existsA way you might use me is
>>> path = "data.th" >>> th = TreeHaus.open(path)
TreeHaus methods:
-
TreeHaus.
getIndex
(indexName)¶ Obtain an index within the store, creating if it does not already exist
Parameters: indexName (str) – the name of the index to obtain Returns: a treehaus.Index
instance allowing key-value pairs to be written and readRaises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> path = "data.th" >>> th = TreeHaus.open(path) >>> books = th.getIndex("books")
-
TreeHaus.
getIndices
()¶ Get the names and numbers of stored keys of all indices in the store
Returns: a list of strings containing the names of all indices Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> path = "~/data.th" >>> th = TreeHaus.open(path) >>> th.getIndices() [('books', 107),('records', 33)]
-
TreeHaus.
removeIndex
(indexName)¶ Remove an index from the store
Parameters: indexName (str) – the name of the index to remove Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> path = "~/data.th" >>> th = TreeHaus.open(path) >>> th.removeIndex("records")
-
TreeHaus.
getUpdates
()¶ Get an iterator over the set of updates that were successfully committed to the store
Returns: iterator returning (updateNumber,timestamp,indexNameCardinalityMap,metadata) tuples Raises: treehaus.StoreClosedException
– raised if the store has been closedMost recent updates are returned first
A way you might use me is:
>>> path = "~/data.th" >>> th = TreeHaus.open(path) >>> th.getUpdates().next() (23, 1558273679, {'books': 107, 'records': 33}, 'add latest titles')
-
TreeHaus.
commit
(metadata='')¶ Commit all outstanding changes in the store to file
Returns: an updateNumber for the commit. Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> path = "~/data.th" >>> th = TreeHaus.open(path) >>> th.getIndex("books")["9781853260629"]={"title":"War and Peace","author":"Leo Tolstoy"} >>> th.commit() 24
-
TreeHaus.
rollback
()¶ Cancel all outstanding changes made to the store
Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> path = "~/data.th" >>> th = TreeHaus.open(path) >>> th.getIndex("books")["9781853260629"]={"title":"War and Peace","author":"David Tolstoy"} >>> th.rollback() # mistake - should be Leo not David - do not persist this change
-
TreeHaus.
close
()¶ close the instance if it is not already closed, any in-progress updates will be committed
after close is called any opened indices or iterators can no longer be used
A way you might use me is:
>>> path = "~/data.th" >>> with TreeHaus.open(path) as th: >>> ... read and write to indexes >>> ... after with block ends, TreeHaus.close() will be called
-
static
TreeHaus.
getVersion
()¶ Get the TreeHaus version
Returns: Version number of TreeHaus as a string in the format “VMajor.VMinor” A way you might use me is:
>>> TreeHaus.version() "0.1"
Index¶
a TreeHaus Index is a dict-like container within a TreeHaus store and returned from a call to treehaus.TreeHaus.getIndex
Methods to retrieve information:
-
Index.
__getitem__
(key)¶ Get an item from this index
Parameters: key – the key value to retrieve Returns: the value of the key Raises: KeyError
– (if the key does not exist in the index)A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> book_details = books["0140449132"]
-
Index.
__contains__
(key)¶ Test if a key exists in the index
Parameters: key – the key value to test Returns: True if the key is found in the index, or False otherwise Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> "0140449132" in books True
-
Index.
get
(key, defaultvalue=None)¶ Get an item from this index
Parameters: - key – the key value to retrieve
- defaultvalue – a value to return if the key was not found
Returns: the value of the key if found in the index, or the defaultvalue otherwise
Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> book_details = books.get("0140449132",{"title":"unknown","author":"unknown"})
-
Index.
traverse
(start=None, stop=None)¶ Iterate over the index in key order
Keyword Arguments: - start – start the iterator from this key value
- stop – end the iterator at this key value
Returns: an iterator which returns (key,value) pairs
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.IndexModifiedException
– raised during iteration if the index was modified after the iterator was opened
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> for (isbn,details) in books: >>> print(str(isbn))
-
Index.
rtraverse
(start=None, stop=None)¶ Iterate over the index in reverse order
Keyword Arguments: - start – start the iterator from this key value
- stop – end the iterator at this key value
Returns: an iterator which returns (key,value) pairs in reverse key order
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.IndexModifiedException
– raised during iteration if the index was modified after the iterator was opened
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> for (isbn,details) in books.rtraverse(): >>> print(str(isbn))
-
Index.
history
(key)¶ Return a history of the values assigned to a key, most recently assigned value first
Parameters: key – the key of interest
Returns: an iterator which returns (updateNumber,value) pairs indicating that the key was assigned that value at that updateNumber
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.IndexModifiedException
– raised during iteration if the index was modified after the iterator was opened
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> for (updateNumber,value) in books.history("0140449132"): >>> print(str((updateNumber,value))) (45,{ "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }) (23,{ "title": "Crimes and Punishment", "author":"Dave Dostoyevsky" })
Methods to update the index:
-
Index.
__setitem__
(key, value)¶ Add or modify an item in this index
Parameters: - key – the key value to add
- value – the associated value to add
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.ReadOnlyException
– (if the store is opened read-only)
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> books["0140449132"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" }
-
Index.
__delitem__
(key)¶ Delete an item from this index
Parameters: key – the key value to delete
Returns: the value of the key that was removed
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.ReadOnlyException
– (if the store is opened read-only)KeyError
– (if the key does not exist in the index)
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> removed_book = del books["0140449132"]
-
Index.
pop
(key, defaultvalue=None)¶ Remove and return a value for a specified key
Parameters: - key – the key to retrieve and remove
- defaultvalue – the value to return if the key was not found
Returns: if the key already existed in the index its value is returned, otherwise defaultvalue is returned
Raises: treehaus.StoreClosedException
– raised if the store has been closedtreehaus.ReadOnlyException
– (if the store is opened read-only)
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> removed_book = books.pop("0140449132")
-
Index.
update
(other)¶ Update the index with multiple (key,value) pairs
Parameters: other – dict or iterable returning (key,value) pairs Raises: treehaus.StoreClosedException
– raised if the store has been closedA way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> newtitles = {} >>> newtitles["9780140449136"] = { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" } >>> newtitles["9781853260629"] = { "title":"War and Peace", "author":"Leo Tolstoy" } >>> books.update(newtitles) >>> books["9781853260629"] { "title":"War and Peace", "author":"Leo Tolstoy" } >>> books["9780140449136"] { "title": "Crime and Punishment", "author":"Fyodor Mikhailovich Dostoyevsky" } >>> th.commit() 54
-
Index.
clear
()¶ Remove all keys from the index
Raises: treehaus.ReadOnlyException
– (if the store is opened read-only)treehaus.StoreClosedException
– raised if the store has been closed
A way you might use me is:
>>> from treehaus import TreeHaus >>> th = TreeHaus.open("data.th") >>> books = th.getIndex("books") >>> books.clear() >>> len(books) 0