Faruk Akgul

Introduction to Algorithms: Patricia Trie

February 29, 2012 | View Comments

According to NIST[1] definition Patricia trie is; a compact representation of a trie in which any node that's an only child is merged with its parent.

Patricia Trie was discovered by Donald Morrison in 1968. It takes log N bit for comparison of each search even though insert/delete operations run on O(n).

Nodes in Patricia Trie contain a field which tracks the bit positions. We use a dummy node and initialize its bit to -1. Root of the trie is root.left.

template<typename K, typename T>
class INode {
public:
  virtual ~INode() {}
  virtual unsigned hash(K key, unsigned bit) = 0;
};

template<typename T>
class StringKeyNode: public INode<string, T> {

private:
  static const int SIZE = sizeof(string) << 2;

public:

  string key;
  T value;
  int bit;
  StringKeyNode<T>* left;
  StringKeyNode<T>* right;

  StringKeyNode(string key, T value, int bit) {
    this->key = key;
    this->value = value;
    this->bit = bit;
  }

  ~StringKeyNode() {
    this->left = NULL;
    this->right = NULL;
    this->bit = -1;
    this->value = NULL;
  }

  virtual unsigned hash(string key, unsigned i) {
    if(key.empty()) return 0;
    unsigned index = static_cast<unsigned>(i / SIZE);
    if(index >= key.length()) return 0;
    unsigned val = (1 << SIZE - 1) >> (i & SIZE - 1);
    return key.at(index) & val;
  }
};

Insertion

Inserting a new node takes O(n) time.

For example;

    -- NULL : initialization
    -- insert:  ababbb -> key
                123456 -> positions
    -- insert:  ababba:
                search: ababb**b** == ababba
                diff. at position 6;

         [6]
        /   \
       a     b
      /       \
    ababba   ababbb

Inserting a new node begins with searching. search method returns a unique key. Once we detect the leftmost position of the key we're inserting and the search key, we call the insert_inner method which recursively walks the tree and inserts the node where node is pointing at.

StringKeyNode<T>* insert_inner(StringKeyNode<T>* node,
                                  K key,
                                  T value,
                                  int i,
                                  StringKeyNode<T>* root) {
    if(node->bit >= i || node->bit <= root->bit) {
      StringKeyNode<T>* temp = new StringKeyNode<T>(key, value, i);
      const bool FLAG = node->hash(key, temp->bit) != 0;
      temp->left = FLAG ? node : temp;
      temp->right = FLAG ? temp : node;
      return temp;
      delete temp;
    }
    if(node->hash(key, node->bit) == 0) {
      node->left = cilk_spawn insert_inner(node->left, key, value, i, node);
    } else {
      node->right = insert_inner(node->right, key, value, i, node);
    }
    cilk_sync;
    return node;
}

void insert(string key, T value) {
    StringKeyNode<T>* node;
    node = cilk_spawn search_inner(this->root->left, key, -1);
    cilk_sync;
    if(node != NULL) {
      string temp = node->key;
      if(temp.compare(key) != 0) {
        int i = 0;
        while(node->hash(key, i) == node->hash(temp, i)) ++i;
        this->root->left = cilk_spawn insert_inner(this->root->left,
                                                    key,
                                                    value,
                                                    i,
                                                    this->root);
        cilk_sync;
      } else {
        node->value = value;
      }
    }
  }

Searching

Important part of searching is the search_inner method which walks the trie recursively and finds a unique key which can have the record of node's key by using the bits of the trie.

StringKeyNode<T>* search_inner(StringKeyNode<T>* node, K key, int i) {
    if(node->bit <= i) return node;
    if(node->hash(key, node->bit) == 0) {
      return search_inner(node->left, key, node->bit);
    } else {
      return search_inner(node->right, key, node->bit);
    }
}

T search(string key) {
    StringKeyNode<T>* node;
    node = cilk_spawn search_inner(this->root->left, key, -1);
    cilk_sync;
    if(node == NULL || node->key != key) return NULL;
    return node->value;
}

Inorder

Putting the nodes in order is no different than an ordinary tree. The only difference is we check the bits of the nodes to detect if node's bit is smaller than parent's bit.

string inorder_inner(StringKeyNode<T>* node, int i) {
    if(node == this->root) return "";
    if(node->bit <= i) return node->value;
    return inorder_inner(node->left, node->bit) + inorder_inner(node->right,
                                                                  node->bit);
}

string inorder() {
    return inorder_inner(this->root->left, -1);
}

Testing

Here's a small code to insert some values to test the trie.

vector<string> generate_names(string s) {
  // generating some strings to put into trie.
  vector<char>v(s.begin(), s.end());
  vector<string> e;
  e.push_back(s);
  while(next_permutation(v.begin(), v.end())) {
    stringstream ss;
    for(unsigned i = 0; i < v.size(); i++) ss << v[i];
    e.push_back(ss.str());
  }
  return e;
}

int factorial(int n) {
  int res = 1;
  for(int i = 1; i <= n; i++) res *= i;
  return res;
}

Benchmarking

To see how our trie performs in various of scenarios and different workers, we could use cilkview for analyzing.

int cilk_main() {

  cilk::cilkview cv;

  string generator = "abc";
  const int LENGTH = factorial(generator.length());
  vector<string> names = generate_names(generator);

  cv.start();

  for(int i = 0; i < LENGTH; i++)
    tree->insert(names[i], SomeValue(names[i], names[i]));

  cv.stop();

  cout << "tree filled in " << cv.accumulated_milliseconds() / 1000.f << " seconds.\n";
  cv.reset();

}

Fun Stuff

Since we've constructed our trie and it works like a champ so now we could play with it. Let's rotate the trie (for absolutely no particular reason)

void rotate_inner(StringKeyNode<T>* node, int bit) {
    if(node != NULL && node->bit > bit) {
      cilk_spawn rotate_inner(node->left, node->bit);
      rotate_inner(node->right, node->bit);
      cilk_sync;
      swap(node->left, node->right);
    }
}

void rotate() {
    cilk_spawn rotate_inner(this->root->left, -1);
    cilk_sync;
}

Source Code

You could get the source code on GitHub. Feel free to fork and improve or fix it. Released under "do whatever you want to do with it" license.

References:

  1. NIST
Share:Tweet

blog comments powered by Disqus