Kamikaze is a utility package wrapping set implementations on document lists. It also implements P4Delta compression algorithm for sorted integer segments to enable Inverted List compression for search engines like Lucene.

Details

Kamikaze version 1.0.0 provides docset implementations on various underlying document id set representations for inverted lists in search engines. Currently the supported implementations include

  1. Integer Array representation : Document set based on Dynamic Integer Arrays
  2. OpenBitSet representation : Document Set based on OpenBitSet implementation from Lucene.
  3. P4Delta representation : Document Set for sorted Integer segments compressed using a variation of the P4Delta compression algorithm.

References

The library also provides elementary set (AND|OR|NOT) operations on DocSets without materializing the final document set, this is extremely useful for large sorted integer segments.