Recently I was surprised by the behaviour of a Haskell program I wrote. This simple example demonstrates the problem:

-- Simple.hs
import System.Environment
main = do
  n <- fmap (read . head) getArgs
  print (length (sequence (replicate n "abc")))

This computes the length of the list formed by every n-sized combination of the letters “a”, “b” and “c”. For example, if n is 4, the first few elements of the list are “aaaa”, “aaab”, “aaac”, “aaba”. (In other words, it’s a slow way to compute 3 to the power of n.) I expected this program to take near-constant space, even though the list might be very long, because it can be consumed by length as it is produced by sequence.

However, as n increases the space required grows exponentially.

For example:

$ ghc Simple
[1 of 1] Compiling Main             ( Simple.hs, Simple.o )
Linking Simple ...

$ ./Simple 10 +RTS -t
59049
<<ghc: 17814600 bytes, 35 GCs, 586672/1241512 avg/max bytes residency
(4 samples), 4M in use, 0.00 INIT (0.00 elapsed), 0.01 MUT (0.01
elapsed), 0.02 GC (0.02 elapsed) :ghc>>

$ ./Simple 11 +RTS -t
177147
<<ghc: 53244416 bytes, 102 GCs, 1740492/3603424 avg/max bytes
residency (6 samples), 11M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT
(0.02 elapsed), 0.11 GC (0.14 elapsed) :ghc>>

The “in use” value is the interesting one: 4MB for n=10, and 11MB for n=11.

Even more surprising is what happens when you give your own definition of sequence:

-- Simple2.hs
import System.Environment
main = do
  n <- fmap (read . head) getArgs
  print (length (mySequence (replicate n "abc")))

mySequence [] = return []
mySequence (y:ys) = do
  x <- y
  xs <- mySequence ys
  return (x:xs)

Now the same examples as before:

$ ghc Simple2
[1 of 1] Compiling Main             ( Simple2.hs, Simple2.o )
Linking Simple2 ...

$ ./Simple2 10 +RTS -t
59049
<<ghc: 54185632 bytes, 105 GCs, 52976/61536 avg/max bytes residency (2
samples), 1M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.03
elapsed), 0.00 GC (0.00 elapsed) :ghc>>

$ ./Simple2 11 +RTS -t
177147
<<ghc: 176535256 bytes, 340 GCs, 52940/61464 avg/max bytes residency
(2 samples), 1M in use, 0.00 INIT (0.00 elapsed), 0.06 MUT (0.06
elapsed), 0.01 GC (0.01 elapsed) :ghc>>

This time we get constant space. Then take the same program, and turn on optimization:

$ ghc Simple2 -O
[1 of 1] Compiling Main             ( Simple2.hs, Simple2.o )
Linking Simple2 ...

$ ./Simple2 10 +RTS -t
59049
<<ghc: 7151368 bytes, 14 GCs, 696204/1274720 avg/max bytes residency
(4 samples), 5M in use, 0.00 INIT (0.01 elapsed), 0.01 MUT (0.01
elapsed), 0.03 GC (0.03 elapsed) :ghc>>

$ ./Simple2 11 +RTS -t
177147
<<ghc: 21323424 bytes, 41 GCs, 1615464/3571488 avg/max bytes residency
(5 samples), 10M in use, 0.00 INIT (0.00 elapsed), 0.01 MUT (0.02
elapsed), 0.08 GC (0.09 elapsed) :ghc>>

And memory use is high again — optimization has made it worse. What is going on?

The explanation is that GHC has tried to make our definition of mySequence faster, but at the expense of using more memory. It has observed that the value of mySequence ys does not depend on what value is chosen for x, and moved it out of the loop. In effect, it has transformed our definition into:

mySequence [] = return []
mySequence (y:ys) =
    let mSys = mySequence ys
    in do
      x <- y
      xs <- mSys
      return (x:xs)

The result for our combination-counting program is that to compute the big list of n-sized combinations, it first computes the list of (n-1)-sized combinations and keeps this sub-list in memory so that it can put an “a”, “b” or “c” on the front of each element. That is, while exploring all the combinations that start with “a”, the sub-list stays in memory so it doesn’t have to be computed again later to make all the combinations that start with “b” and “c”. The sub-list is pretty big, and that’s the cause of the high memory use.

This optimization is called “full laziness”, and you can turn it off:

$ ghc Simple2 -O -fno-full-laziness

$ ./Simple2 10 +RTS -t
59049
<<ghc: 54152416 bytes, 105 GCs, 31114/44416 avg/max bytes residency
(11 samples), 2M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02
elapsed), 0.03 GC (0.03 elapsed) :ghc>>

$ ./Simple2 11 +RTS -t
177147
<<ghc: 176502040 bytes, 340 GCs, 30356/44416 avg/max bytes residency
(34 samples), 2M in use, 0.00 INIT (0.00 elapsed), 0.05 MUT (0.05
elapsed), 0.06 GC (0.06 elapsed) :ghc>>

With the optimization turned off, memory use is constant again.

When I first encountered this problem I had no idea what the cause was. After some searching I found this discussion on Stack Overflow, which explains what is going on and a couple of ways to work around it (apart from just turning off the full laziness optimization).

A second problem

A variation of the problem comes courtesy of Albert Lai. In this program, we also get exponential memory use, but changing the optimization level doesn’t fix it. Here is the program in question:

-- BadMemory.hs
import System.Environment(getArgs)
main = do
   n <- (read . head) `fmap` getArgs
   print (length (mysequence (replicate n "012")))
mysequence ms = foldr k (return []) ms where
   k m m' = do
     x <- m
     xs <- m'
     return (x:xs)

On the surface it looks similar to the previous programs. In particular, the definition of k is very close to mySequence above and so you might expect that the full laziness optimization is again the culprit. But turning off optimization here doesn’t help.

In this case the foldr is the problem. Recall the behaviour of foldr:

foldr f z [a,b,c,d,e] = f a (f   b (f   c (f   d (f   e     z))))
                      =   a `f` (b `f` (c `f` (d `f` (e `f` z))))

In the misbehaving problem, we have:

mysequence ms = foldr k (return []) ms
              = m0 `k` (m1 `k` (m2 `k` ... (mn `k` return []) ... ))

where ms = [m0, m1, ... mn]. If we examine the outermost call to k more closely, we get:

  m0 `k`                    (m1 `k` (m2 `k` ... (mn `k` return []) ... ))
= k                      m0 (m1 `k` (m2 `k` ... (mn `k` return []) ... ))
  (\m m' -> <body of k>) m0 (m1 `k` (m2 `k` ... (mn `k` return []) ... ))

This is equivalent in behaviour to:

let m  = m0
    m' = (m1 `k` (m2 `k` ... (mn `k` return []) ... ))
in <body of k>

The function k has done the same thing that full laziness did: it’s given a name (here m') to the big sub-list, causing it to be shared across the loop over the choice of the first element (the x <- m).