Parallel Hashmap

  • With getpy[0], a python wrapper, you can get 200x faster map reads in parallel

        In [1]: import numpy as np
           ...: import getpy as gp
        
        In [2]: key_type = np.dtype('u8')
           ...: value_type = np.dtype('u8')
        
        In [3]: keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
           ...: values = np.random.randint(1, 1000, size=10**2, dtype=value_type)
           ...: 
           ...: gp_dict = gp.Dict(key_type, value_type, default_value=42)
           ...: gp_dict[keys] = values
           ...: 
           ...: random_keys = np.random.randint(1, 1000, size=500, dtype=key_type)
        
        In [4]: %timeit random_values = gp_dict[random_keys]
        2.19 µs ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
        
        In [7]: %timeit [gp_dict[k] for k in random_keys]
        491 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    
    [0] https://github.com/atom-moyer/getpy

  • Still the state of the art, bypassing folly and the swisstables on parallel benchmarks.

    And even on single threaded workloads it's about 10x faster than std::unordered_map. My smhasher has benchmark tables.