手搓布隆过滤器

布隆过滤器的核心思想是：

使用一个大的 位数组（BitSet）
使用 多个哈希函数（HashFunction）
插入元素时，对元素进行多次哈希并将对应的位数组位置置为true（遍历哈希函数，便于使用不同的种子来把一个要哈希的对象变为多个int值放入bitset中）
查询元素时，只要任意一个对应位为0，就确定该元素一定不存在；如果所有位都为1，可能存在（存在误判）

import java.util.BitSet;  
  
public class MyBloomFilter {  
    private static final int DEFAULT_SIZE = Integer.MAX_VALUE;  
    private static final int MIN_SIZE = 1000;  
    private static int SIZE = DEFAULT_SIZE;  
    //hash函数种子因子  
    private static final int[] HASH_SEEDS = new int[]{3, 5, 7, 11, 13, 17, 19, 23, 29, 31};  
  
    private BitSet bitSet = null;  
  
    //HASH函数  
    private HashFunction[] hashFunctions = new HashFunction[HASH_SEEDS.length];  
  
    public MyBloomFilter() {  
        init();  
    }  
  
    public MyBloomFilter(int size) {  
        if (size >= MIN_SIZE) {  
            SIZE = size;  
        }  
        init();  
    }  
  
    private void init() {  
        bitSet = new BitSet(SIZE);  
        for (int i = 0; i < HASH_SEEDS.length; i++) {  
            hashFunctions[i] = new HashFunction(SIZE, HASH_SEEDS[i]);  
        }  
    }  
  
    public void add(Object value) {  
        for (HashFunction f : hashFunctions) {  
            bitSet.set(f.hash(value), true);  
        }  
    }  
  
    public boolean contains(Object value){  
        boolean result = true;  
        for (HashFunction f : hashFunctions) {  
            result = result && bitSet.get(f.hash(value));  
  
            if (!result){  
                return result;  
            }  
        }  
        return result;  
    }  
  
    public static class HashFunction {  
        private int size;  
        private int seed;  
  
        public HashFunction(int size, int seed) {  
            this.size = size;  
            this.seed = seed;  
        }  
  
        public int hash(Object value){  
            if (value == null){  
                return 0;  
            }  
            int hash1 = value.hashCode();  
  
            int hash2 = hash1 >>> 16;  
  
            int combine = hash1 ^ hash2;  
  
            return Math.abs(combine * seed) % size;  
  
        }  
    }  
  
    public static void main(String[] args) {  
        Integer num1 = 31312;  
        Integer num2 = 312312;  
  
        MyBloomFilter bloomFilter = new MyBloomFilter();  
  
        System.out.println(bloomFilter.contains(num1) + " " + bloomFilter.contains(num2));  
  
        bloomFilter.add(num1);  
        bloomFilter.add(num2);  
  
        System.out.println(bloomFilter.contains(num1) + " " + bloomFilter.contains(num2));  
  
        System.out.println(bloomFilter.contains(312432));  
    }  
}

详解关键

int hash1 = value.hashCode();  
  
int hash2 = hash1 >>> 16;  
  
int combine = hash1 ^ hash2;  
  
return Math.abs(combine * seed) % size;

这段代码是布隆过滤器中哈希函数的核心，用于将任意对象转化为在 [0, size) 范围内的整数索引。

1	int hash1 = value.hashCode();

✅ 获取原始哈希值

使用 Java 的 hashCode() 方法得到对象的哈希值。
这是 Java 中所有对象都具备的，通常能反映其数据特征。

1	int hash2 = hash1 >>> 16;

✅ 提取高位信息，做混淆处理

将 hash1 无符号右移 16 位，得到高位部分。
这么做是为了将高位信息与低位结合，增强哈希的分布性（也叫“扰动函数”）。

类似思路也出现在 HashMap 的源码中：

// HashMap.java 中对 hashCode 的扰动函数
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

1	int combine = hash1 ^ hash2;

✅ 异或操作，打乱模式

将原始哈希值与高位进行异或，增强随机性，避免哈希冲突集中。
异或操作本质上是一种 位级扰动（bit-level mixing）。

1	return Math.abs(combine * seed) % size;

✅ 与种子结合，压缩到位图范围

乘上 seed：
- 使用不同的种子值，可以生成多个不同的哈希函数。
- 这是布隆过滤器中实现多个独立哈希函数的一种方式。
取绝对值：
- 保证哈希结果是非负的。
模 size：
- 将哈希结果限定到 BitSet 的合法索引范围 [0, size)。

🔎 整体目的是：

增强分布性和抗冲突性（扰动 hash1）
通过种子值构造出多个不同的哈希函数
最终将结果落入 BitSet 有效范围内

✅ 总结

步骤	目的
`hashCode()`	获取对象哈希值
`>>> 16` + `^`	扰动，提高分布性
`* seed`	构造多个哈希函数
`Math.abs(...) % size`	映射到 BitSet 的有效索引范围

这是一种 轻量级但分布性还可以的哈希函数构造方式，适合用于布隆过滤器中快速构建多个 hash 函数。如果想进一步优化精度，可以考虑使用 MurmurHash, FNV, Guava.hashing() 等更高质量的哈希算法。