I implemented the first function from this page which works well historically, didn't cause any collisions on my small sample, and produced hashes that gave good results for min_hash_sim.py. However, reading this page makes me wonder if we can work on finding a better one in the future.
I implemented the first function from this page which works well historically, didn't cause any collisions on my small sample, and produced hashes that gave good results for min_hash_sim.py. However, reading this page makes me wonder if we can work on finding a better one in the future.