Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor optimization idea for filter[WithKey] #434

Open
sjakobi opened this issue Apr 24, 2022 · 1 comment
Open

Minor optimization idea for filter[WithKey] #434

sjakobi opened this issue Apr 24, 2022 · 1 comment

Comments

@sjakobi
Copy link
Member

sjakobi commented Apr 24, 2022

These functions are currently sharing code with mapMaybe[WithKey] via filterMapAux.

-- | Common implementation for 'filterWithKey' and 'mapMaybeWithKey',
-- allowing the former to former to reuse terms.
filterMapAux :: forall k v1 v2
. (HashMap k v1 -> Maybe (HashMap k v2))
-> (Leaf k v1 -> Maybe (Leaf k v2))
-> HashMap k v1
-> HashMap k v2
filterMapAux onLeaf onColl = go
where
go Empty = Empty
go t@Leaf{}
| Just t' <- onLeaf t = t'
| otherwise = Empty
go (BitmapIndexed b ary) = filterA ary b
go (Full ary) = filterA ary fullBitmap
go (Collision h ary) = filterC ary h
filterA ary0 b0 =
let !n = A.length ary0
in runST $ do
mary <- A.new_ n
step ary0 mary b0 0 0 1 n
where
step :: A.Array (HashMap k v1) -> A.MArray s (HashMap k v2)
-> Bitmap -> Int -> Int -> Bitmap -> Int
-> ST s (HashMap k v2)
step !ary !mary !b i !j !bi n
| i >= n = case j of
0 -> return Empty
1 -> do
ch <- A.read mary 0
case ch of
t | isLeafOrCollision t -> return t
_ -> BitmapIndexed b <$> A.trim mary 1
_ -> do
ary2 <- A.trim mary j
return $! if j == maxChildren
then Full ary2
else BitmapIndexed b ary2
| bi .&. b == 0 = step ary mary b i j (bi `unsafeShiftL` 1) n
| otherwise = case go (A.index ary i) of
Empty -> step ary mary (b .&. complement bi) (i+1) j
(bi `unsafeShiftL` 1) n
t -> do A.write mary j t
step ary mary b (i+1) (j+1) (bi `unsafeShiftL` 1) n
filterC ary0 h =
let !n = A.length ary0
in runST $ do
mary <- A.new_ n
step ary0 mary 0 0 n
where
step :: A.Array (Leaf k v1) -> A.MArray s (Leaf k v2)
-> Int -> Int -> Int
-> ST s (HashMap k v2)
step !ary !mary i !j n
| i >= n = case j of
0 -> return Empty
1 -> do l <- A.read mary 0
return $! Leaf h l
_ | i == j -> do ary2 <- A.unsafeFreeze mary
return $! Collision h ary2
| otherwise -> do ary2 <- A.trim mary j
return $! Collision h ary2
| Just el <- onColl $! A.index ary i
= A.write mary j el >> step ary mary (i+1) (j+1) n
| otherwise = step ary mary (i+1) j n
{-# INLINE filterMapAux #-}

The inner step loops currently work by reading from input Arrays and writing to new_ output MArrays. If we were to give up the code sharing, the loops in filter[WithKey] could perform both reads and writes on thawed output MArrays, removing the input arrays arguments from the inner loops, and possibly increasing cache locality.

For filterC, this might look roughly like this:

@@ -2116,13 +2116,13 @@ filterMapAux onLeaf onColl = go
     filterC ary0 h =
         let !n = A.length ary0
         in runST $ do
-            mary <- A.new_ n
-            step ary0 mary 0 0 n
+            mary <- A.thaw ary0 0 n
+            step mary 0 0 n
       where
-        step :: A.Array (Leaf k v1) -> A.MArray s (Leaf k v2)
+        step :: A.MArray s (Leaf k v2)
              -> Int -> Int -> Int
              -> ST s (HashMap k v2)
-        step !ary !mary i !j n
+        step !mary i !j n
             | i >= n    = case j of
                 0 -> return Empty
                 1 -> do l <- A.read mary 0
@@ -2131,9 +2131,10 @@ filterMapAux onLeaf onColl = go
                                  return $! Collision h ary2
                   | otherwise -> do ary2 <- A.unsafeFreeze =<< A.shrink mary j
                                     return $! Collision h ary2
-            | Just el <- onColl $! A.index ary i
-                = A.write mary j el >> step ary mary (i+1) (j+1) n
-            | otherwise = step ary mary (i+1) j n
+            | otherwise = do
+                case A.read mary i of
+                    Just el -> A.write mary j el >> step mary (i+1) (j+1) n
+                    Nothing -> step mary (i+1) j n
 {-# INLINE filterMapAux #-}
@treeowl
Copy link
Collaborator

treeowl commented Apr 29, 2022

If one of the arrays falls out of L2 cache while applying the filter function, then the other one will probably do so too. I don't think cache locality is likely to be an issue there. That leaves register pressure, I guess? I don't know enough about such low-level stuff, TBH, but I guess it could make a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants