wiki:KrakenBloomFilter

Kraken Bloom Filter

Kraken Bloom Filter implements bloom filter algorithm. This algorithm is generally used in spell checker and secondary index. See  bloom filter described in wikipedia.

Author

Interface

Bloom filter uses statistical data structure based on bitmap. Because of this, bloom filter has some false positive errors. Fortunately, we can control error rate, capacity, and number of hash functions. If you configure error rate to very low and give small capacity, then number of hash functions increases. That means you need more hash calculations and more time. There is a trade off between time and space.

public class BloomFilter<T> {
	public BloomFilter();
	public BloomFilter(long capacity);
	public BloomFilter(HashFunction<T> first, HashFunction<T> second);
	public BloomFilter(double errorRate, long capacity, HashFunction<T>, HashFunction<T> second);
	public BloomFilter(double errorRate, long capacity, HashFunction<T> first, HashFunction<T> second, BitSet bitmap);

	// core operations: add() and contains()
	public void add(T key);
	public boolean contains(T key);

	public BitSet getBitmap();
}

Bloom filter constructor calculates optimal capacity and number of hash funcions automatically.

Maven configuration

<dependency>
	<groupId>org.krakenapps</groupId>
	<artifactId>kraken-bloomfilter</artifactId>
	<version>1.0.0</version>
</dependency>

History

  • 1.0.0 release (2010-08-24)