# Haskell Magic: Making hylos faster

This is a follow-up to my previous post on this subject, which talked about hylomorphisms from a purely semantic (and usage-oriented) point of view. However, since then, someone brought Bananas in Space to my attention. This rather neat paper, in addition to many other things, discusses a few optimizations which we can safely make to a range of recursion schemes, due to the formal guarantees that they provide. The one for hylomorphisms is neat and elegant, but is given scant mention by the authors, due to it being so obvious (at least to them). Given that we're not all as smart as they are, I figured a blog post elucidating this particular optimization would be cool. In addition, the correctness proof for their improved hylomorphism is such a great example of equational reasoning (at least, in my opinion) that I wanted to share it with the world.

## Recap

For clarity, here are a bunch of definitions that we'll need throughout this post. All of these are described in more detail in my previous post, as well as the links therein.

```import Control.Category ((>>>))

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a

newtype Term f = In { out :: f (Term f) }

cata :: Functor f => Algebra f a -> Term f -> a
cata f = out >>> fmap (cata f) >>> f

ana :: Functor f => Coalgebra f a -> a -> Term f
ana f = f >>> fmap (ana f) >>> In

hylo :: Functor f => Coalgebra f a -> Algebra f b -> a -> b
hylo f g = ana f >>> cata g
```

Now, this definition of a hylomorphism is certainly correct, but at the same time, it's not as efficient as it could be. The reason for this? We have to complete `ana f` before we can get started on `cata g`, which means we'll need to build up the entire intermediate structure before tearing it down, even if we don't need to. While in some cases, our compiler might be able to optimize this away, it's not necessarily a given, especially if what we're doing is complex.

## Doing it better

Luckily for us, Meijer and Hutton wrote an improved definition of hylomorphisms:

```hylo' :: Functor f => Coalgebra f a -> Algebra f b -> a - > b
hylo' f g = f >>> map (hylo' f g) >>> g
```

There are a few nice things about it:

1. Provably equivalent in terms of semantics to the previous definition
2. Doesn't rely on the compiler to optimize generation of the intermediate structure
3. No longer depends on `ana` and `cata` at all (and can be used to define both if needs be)

We'll now examine, and demonstrate, each of these claims, and why they are true, in turn.

### Equivalence

In order to make sure that `hylo'` is correct, we have to demonstrate that no matter what algebra and coalgebra you give it, and what initial value is fed to it, we will get the same outcome as if we had used `hylo` instead. Something like this requires a proof - and in languages other than Haskell, this might be a hard ask. Luckily for us, thanks to Haskell's strong immutability and purity guarantees, we can use a technique called equational reasoning. Essentially, if we start with x, and then make some finite number of replacements of forms with other, equivalent forms, to produce y, we know that the semantics of x and y are the same. So let's try that.

We're going to make use of a few such equivalent forms, so let's first discuss them. One really obvious equivalence we can exploit is that a function's definition is always equivalent to a call. For example:

```square :: Int -> Int
square x = x * x

{-
Every time we see:

square n

We can replace it with:

n * n

Likewise, every time we see:

m * m

We can replace it with:

square m
-}
```

As obvious as this might seem, not every language can make such promises, mostly because of side effects. Thanks to Haskell keeping these isolated, and forcing us to be honest if a function ever performs them, we can reason this way in general. Another obvious equivalence is the duality of a newtype constructor and its 'unwrapper' function:

```newtype Foo x = { unFoo :: x }

{-
Every time we see:

Foo >>> unFoo

We can replace it with:

id
-}
```

Yet another useful equivalence for us is the associativity of `(>>>)`; whenever we have `f >>> g >>> h`, it doesn't matter if we do ```(f >>> g) >>> h``` or `f >>> (g >>> h)`, as the result will be the same. This is actually a requirement of the laws of `Category`. The only (reasonably) non-obvious equivalence we will use is the second functor law:

```forall f g . fmap f >>> fmap g == fmap (f >>> g)
```

This law is actually a free theorem in Haskell, and follows directly from the first functor law (`fmap id == id`). Surprisingly enough, this is everything we will need to demonstrate the equivalence of `hylo` and `hylo'`!

So, let's begin. Let `f` be a coalgebra and `g` be an algebra. We start with:

```hylo f g
```

By replacing `hylo` with its definition, we get:

```ana f >>> cata g
```

By replacing both `ana` and `cata` with their respective definitions, we get:

```(f >>> fmap (ana f) >>> In) >>> (out >>> fmap (cata g) >>> g)
```

Thanks to the associativity of `(>>>)`, we can re-bracket this:

```(f >>> fmap (ana f)) >>> (In >>> out) >>> (fmap (cata g) >>> g)
```

Seeing as `In` is a newtype constructor and `out` is its 'unwrapper', we can replace `(In >>> out)` with `id`. As `id` simply returns its argument as-is, we can remove it from the 'pipeline' above:

```(f >>> fmap (ana f)) >>> (fmap (cata g) >>> g)
```

Again, some re-bracketing enabled by the associativity of `(>>>)`:

```f >>> (fmap (ana f) >>> fmap (cata g)) >>> g
```

Now, by applying the second functor law, we get:

```f >>> fmap (ana f >>> cata g) >>> g
```

However, as we know, `ana f >>> cata g` is just `hylo f g`:

```f >>> fmap (hylo f g) >>> g
```

This is just the definition of `hylo'` with a different name. Thus, our proof by equational reasoning is complete.

### Improved efficiency

Let's now see what effect this can have on an existing problem: my mergesort example from last time. Here's its code, for clarity:

```import qualified Data.Vector as V

merge :: Ord a => V.Vector a -> V.Vector a -> V.Vector a
merge v1 v2 = evalState (V.replicateM totalLength go) (0,0)
where v1Length = V.length v1
v2Length = V.length v2
totalLength = v1Length + v2Length
go = do (f1, f2) <- get
if f1 < v1Length && (not (f2 < v2Length) || (v1 V.! f1 <= v2 V.! f2))
then put (f1 + 1, f2) >> return (v1 V.! f1)
else put (f1, f2 + 1) >> return (v2 V.! f2)

data WorkTreeF a s = Leaf a | Bin s s

instance Functor (WorkTreeF a) where
fmap _ (Leaf x) = Leaf x
fmap f (Bin x y) = Bin (f x) (f y)

breakDown :: Coalgebra (WorkTreeF (V.Vector a)) (V.Vector a)
breakDown v = case V.length v of
1 -> Leaf v
x -> let half = x `div` 2 in
Bin (V.slice 0 half v) (V.slice half (x - half) v)

mergeAlg :: Ord a => Algebrea (WorkTreeF (V.Vector a)) (V.Vector a)
mergeAlg (Leaf v) = v
mergeAlg (Bin v1 v2) = merge v1 v2

mergeSort :: Ord a => V.Vector a -> V.Vector a
mergeSort = hylo breakDown mergeAlg
```

So let's see what happens when we use `hylo'` to mergesort the array ```[2, 3, 1, 1, 10, 0, 14, 4]```. According to `hylo'`, we have to first feed our array to our coalgebra. As our array has more than 1 element, that means we construct a two-leaf `WorkTreeF` that looks like this: Next, we have to `fmap hylo'` over this structure using the same algebra and coalgebra we started with. By our definition, we first have to apply `hylo'` to the left subtree. Finding a `Leaf` storing a non-empty array there, we repeat the process, applying the coalgebra to make another two-leaf tree: After that, we have to `fmap hylo'` again, and as previously, we must first do it to the left subtree. Eventually, we'll end up here, having created two `Leaf` nodes containing singleton arrays:

2 and 3" src="{static}/images/hylosort-3.svg">

However, at this point, `fmap hylo'` will do nothing - there's no more dividing we can do. Thus, by definition, we now have to apply our catamorphism, giving us back our singleton arrays on each side:

2 and 3" src="{static}/images/hylosort-4.svg">

We then have to apply our catamorphism again to merge them:

2 and 3. The subtree has been replaced by a single node [2,3]." src="{static}/images/hylosort-5.svg">

Repeating this process, we eventually collect together the entire left subtree of the root: There are two important observations we can make:

• Throughout this entire time, the right subtree of the root has not been touched
• Now that we're about to 'start' on the right subtree, the left subtree has been fully collapsed

Thus, we never actually have to generate the entire structure - we only have to have one branch fully 'materialized' at any one time. While in theory, a good compiler could do this for us without forcing us to rewrite `hylo`, doing it ourselves gives us two good outcomes:

• We can be sure that it happens; and
• It will now benefit everything we wrote using hylos, regardless of what they operate on.

### Recovering the ana- and catamorphism

Having gotten rid of `ana` and `cata` for defining hylomorphisms, we'll now demonstrate that, in fact, we can define both of them using `hylo'`:

```ana' :: Functor f => Coalgebra f a -> a -> Term f
ana' f = hylo' f In

cata' :: Functor f => Algebra f a -> Term f -> a
cata' = hylo' out
```

These definitions become more obvious when we consider the types of `In` and `out`:

```In :: f (Term f) -> Term f
-- that is, Algebra f (Term f)
out :: Term f -> f (Term f)
-- that is, Coalgebra f (Term f)
```

We can also do some equational reasoning to reconstruct our original definition for `ana` and `cata`. For `ana`, the chain of substitutions goes like this:

```ana' f -- initial
hylo' f In -- by definition of ana'
f >>> fmap (hylo' f In) >>> In -- by definition of hylo'
f >>> fmap (ana' f) >>> In -- by definition of ana'
```

While for `cata`, it goes like this:

```cata' f-- initial
hylo' out f -- by definition of cata'
out >>> fmap (hylo' out f) >>> f -- by definition of hylo'
out >>> fmap (cata' f) >>> f -- by definition of cata'
```

Clearly, aside from a slight difference in naming, we've recovered both `ana` and `cata` only using `hylo'`.

## Conclusion

We've taken a fun tour of how we can improve `hylo`, and discovered some clever things along the way. Additionally, I hope that this can demonstrate the benefits of Haskell, if not least of all due to the power of equational reasoning. Remember, think hard and have fun!