🌸 Bloom Filter Essentials

Overview

Maintains a fixed-length bit array and k independent hash functions.
Supports add and possibly contains checks in constant time.
No deletions in the basic form; removing items requires counting filters or rebuilds.

How It Works

Insert: run the element through each hash function and set the corresponding bit positions to 1.
Query: recompute the hashes; if any bit is 0, the element is definitely absent. If all are 1, it may exist.

How It Works

Example

Bit array length (m) = 13
Hash functions (k) = 3

Adding "banana" sets bits [6, 2, 9] to 1. Adding "apple" sets [10, 9, 4]. Checking "orange" hits a 0 bit, so it is definitely absent.

Example

Search

Checking “orange”:
- Hash values: 7, 12
- Bit array: Positions 7 and 12 are 0, so “orange” is not in the set.

Checking “banana”:
- Hash values: 6, 2, 9
- Bit array: Positions 6, 2, and 9 are all 1, so “banana” might be in the set.

Nevertheless, the Bloom filter can produce a false positive response when an object does not actually exist but the Bloom filter claims otherwise. This happens when all hash functions for the object return hash values of 1 corresponding to other already inserted objects in the filter.

False Positives & Tuning

Occur when multiple items set the same bit positions.
Probability rises as the filter fills up; manage load factor (n, number of inserted items) relative to m.
Reduce false positives by increasing m, increasing k, or limiting n.

Parameter	Effect
Larger bit array (`m`)	Lowers collision rate, more memory.
More hash functions (`k`)	Improves accuracy until diminishing returns; typically `k ≈ (m/n) ln 2`.
Fewer items (`n`)	Keeps filter sparse and accurate.

Implementation Sketch (JavaScript)

class BloomFilter {
  constructor(size, hashCount) {
    this.size = size;
    this.hashCount = hashCount;
    this.bits = new Array(size).fill(0);
  }

  hash(item, seed) {
    let hash = 0;
    for (const ch of item) {
      hash = (hash << 5) - hash + ch.charCodeAt(0);
      hash |= 0; // keep 32-bit
    }
    return Math.abs(hash + seed * 31) % this.size;
  }

  add(item) {
    for (let i = 0; i < this.hashCount; i++) {
      this.bits[this.hash(item, i)] = 1;
    }
  }

  mightContain(item) {
    for (let i = 0; i < this.hashCount; i++) {
      if (this.bits[this.hash(item, i)] === 0) return false;
    }
    return true;
  }
}

// Example usage
const size = 100; // Choose the size of the bit array
const hashCount = 3; // Choose the number of hash functions

const bloomFilter = new BloomFilter(size, hashCount);

// Adding items to the filter
const itemsToAdd = ['dog', 'cat', 'bird'];
itemsToAdd.forEach(item => bloomFilter.add(item));

// Testing if items might be in the filter
const testItems = ['dog', 'cat', 'bird', 'fish', 'horse'];
testItems.forEach(item => {
  if (bloomFilter.check(item)) {
    console.log(`'${item}' is probably in the set.`);
  } else {
    console.log(`'${item}' is definitely not in the set.`);
  }
});

When to Use

Fast cache-miss checks (e.g., CDN, database read-through caches).
Prevent expensive lookups for definitely absent keys.
Deduplication in streaming pipelines.

Key Takeaways

Space-efficient membership checks with configurable false positive rate.
Requires well-distributed hash functions and pre-sized parameters.
Monitor fill ratio; rebuild or expand when false positives trend upward.