Custom hash objects in Python
Note: This is not a tip on implementing hash functions, this is how you can remove a certain layer of peeking around into objects
__hash and __cmp__
Consider a useless HTML parser with a simple node design where you want to associate the node name with its attributes.
You want to use the absolute node name as a unique hash.
The solution is to define custom implementations for __hash__ and __cmp__, two magic methods. For more information and constraints about them take a look at the Python docs.
The builtin functions hash(obj) and cmp(obj1, obj2) will attempt to call there __underscored__ counterparts on objs.
import UserDict # allow NodeAttrs to behave like a dictionary, not significant for this example
class NodeName(object):
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
def __str__(self):
return (parent and str(parent) or '') + self.name
def __hash__(self):
# the hash of our string is our unique hash
return hash(str(self))
def __cmp__(self, other):
# similarly the strings are good for comparisons
return cmp(str(self), str(other))
class NodeAttrs(UserDict.UserDict):
def __init__(self, attrs={}):
self.update(attrs)
Where we assume that the parser is doing the heavy lifting of parsing the name, and putting the attributes in a dictionary. Now to use this in a dictionary you would do the following:
>>> d = {}
>>> d[node_name] = attrs # node_name is an instance of NodeName and attrs
... # do anything which can be done to a dictionary and its keys
Thats it! For more magic methods see Python __Underscore__ Methods