Hacker News

Parallel Hashmap

by cr4zyon 4/23/2022, 6:40:30 PM with 2 comments

by cr4zyon 4/23/2022, 6:42:30 PM

With getpy[0], a python wrapper, you can get 200x faster map reads in parallel

    In [1]: import numpy as np
       ...: import getpy as gp
    
    In [2]: key_type = np.dtype('u8')
       ...: value_type = np.dtype('u8')
    
    In [3]: keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
       ...: values = np.random.randint(1, 1000, size=10**2, dtype=value_type)
       ...: 
       ...: gp_dict = gp.Dict(key_type, value_type, default_value=42)
       ...: gp_dict[keys] = values
       ...: 
       ...: random_keys = np.random.randint(1, 1000, size=500, dtype=key_type)
    
    In [4]: %timeit random_values = gp_dict[random_keys]
    2.19 µs ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    
    In [7]: %timeit [gp_dict[k] for k in random_keys]
    491 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

[0] https://github.com/atom-moyer/getpy

by rurbanon 4/24/2022, 4:47:07 AM
Still the state of the art, bypassing folly and the swisstables on parallel benchmarks.
And even on single threaded workloads it's about 10x faster than std::unordered_map. My smhasher has benchmark tables.