Explaining the Abstract Algorithm  Part 1
May 08, 2019
Warnings: Haskell knowledge required.
Creating a Pathological Example
In my last post on the Abstract Algorithm, I explained several important terms such as redex and Optimal. REad that before this one. Now I’m going to explain how the Abstract Algorithm works. Let’s set up an example where the Abstract Algorithm really shines.
id x = x
pair p = (p, p)
part_1 x = x id
part_2 y = pair (y 3)
to_reduce = part_1 part_2
 to_reduce == (3, 3)
This might seem really convoluted  let’s add some type annotations. We’ll say that the 3
is an Int
for simplicity.
id x = x
pair p = (p, p)
part_1 :: ((p > p) > t) > t
part_1 x = x id
part_2 :: (Int > b) > (b, b)
part_2 y = pair (y 3)
to_reduce :: (Int, Int)
to_reduce = part_1 part_2
 to_reduce == (3, 3)
Now, let’s lazily reduce to_reduce
. First we replace part_1
with its definition.
to_reduce = (\x > x id) part_2
Ok, cool, we can just apply the function.
to_reduce = part_2 id
Now the only way to reduce this further is to replace part_2
with its definition.
to_reduce = (\y > pair (y 3)) id
Then we can just apply the function.
to_reduce = pair (id 3)
Now replace pair
with its definiton.
to_reduce = (\p > (p, p)) (id 3)
Then apply the function.
to_reduce = (id 3, id 3)
And then I think it’s pretty obvious from here how this turns into (3, 3)
.
So, there’s the same problem as last time. Namely, the id 3
function gets duplicated and evaluated twice. GHC is smart enough to catch this, even with optimizations turned off, and share the value. This is because GHC’s evaluation strategy is more complicated than just straightforward lazy evaluation. See this StackOverflow question for more info and why Haskell’s strategy does still have issues.
What Is a Virtual Redex
If you remember, a redex
is any expression that can be reduced, such as (\x > x) 3
can be reduced to just 3
.
Let’s look at the definition of part_2
:
part_2 y = pair (y 3)
What you should notice here is that (y 3)
is not a redex, since we don’t know what function y
will be. Just looking inside the part_2
function, we have no information that would allow us to reduce part_2
. But, later in the reduction of these functions, we will eventually replace y
with id
, which will allow us to easily reduce it. But as of right now, (y 3)
is not a redex since it can’t be reduced. Since we hope that we will one day be able to reduce (y 3)
, we’ll call it a “virtual redex”.
So, is it possible to reduce a virtual redex? It is, and here’s how. Here is our example again:
id x = x
pair p = (p, p)
part_1 x = x id
part_2 y = pair (y 3)
to_reduce = part_1 part_2
We’re trying to reduce to_reduce
, but we want to reduce the (y 3)
expression within part_2
early for the sake of demonstration. We identify that we can’t do anything with (y 3)
because we don’t know what y
represents, and the reason we don’t know what y
represents is because it’s passed as a parameter to part_2
. So now we “rise up” and replace part_2
with it’s value.
id x = x
pair p = (p, p)
part_1 x = x id
 part_2 y = pair (y 3)
to_reduce = part_1 (\y > pair (y 3))
Now, we still don’t know the value of y
, so we rise up again:
id x = x
pair p = (p, p)
 part_1 x = x id
 part_2 y = pair (y 3)
to_reduce = (\x > x id) (\y > pair (y 3))
Now we can reduce this:
id x = x
pair p = (p, p)
 part_1 x = x id
 part_2 y = pair (y 3)
to_reduce = (\y > pair (y 3)) id
And then we can reduce this one more time.
id x = x
pair p = (p, p)
 part_1 x = x id
 part_2 y = pair (y 3)
to_reduce = pair (id 3)
And now, finally, we can reduce the (y 3)
since we now know it’s really (id 3)
id x = x
pair p = (p, p)
 part_1 x = x id
 part_2 y = pair (y 3)
to_reduce = pair 3
It might thing that we’re just reducing until we get a value to pass to the virtual redex. But we’re actually being a bit more direct than that. Visualize it like a tree:
to_reduce
$
/ \
/ \
/ \
part_1 x part_2 y ←.
$ $ \
/ \ / \ \
/ \ / \ \
/   \ \
x id x pair p $ ← our virtual redex /
 / \ we want to reduce this but we can't /
x y 3 because we get the value of y from up here  `
The $
represents function application, same as normal Haskell. When I write part_1 x
, it represents the function part_1
which takes the parameter x
(I really should write \x > part_1 x
, but we have space constraints here). With this visualization we can present a more principled way of reducing the virtual redex y 3
. Also I’ve left off the definition of pair
just for brevity’s sake.
So, to evaluate the virtual redex, we start at the $
.
to_reduce
$
/ \
/ \
/ \
part_1 x part_2 y
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $ ← current position
 / \
x y 3
Now we examine the left side of $
(y
), and find we don’t know what it is!
to_reduce
$
/ \
/ \
/ \
part_1 x part_2 y
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
⮤ current position
This means that this expression can’t be reduced until we get more information about y
. If we knew what y
was going to be, we could reduce this further. To figure out what y
is, let’s jump to where it is introduced.
to_reduce
$
/ \
/ \
/ \
part_1 x part_2 y ← current position
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
Yes, y
is introduced at part_2
. As of right now, we still don’t know anything about y
, so let’s defer this question right now. For now, we want to answer if part_2
is the left part of a function application, because if it were we could find the value of y
just by looking at the right part of that function application. So let’s say we “push the question of ‘what is y
’ onto the stack and we arenow trying to see if the part_2
function is applied to any values”. Let’s make a new area to keep track of what mysteries we’re solving.

to_reduce  mysteries to solve 
$   
/ \  what is part_2 applied to?  ← current mystery
/ \  what is y? 
/ \ 
part_1 x part_2 y ← current position
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
You can think of the mysteries as a stack  we’re always trying to find the top one, because we need to to solve any below it. If we knew what part_2
was applied to, whatever it was applied to would be the value of y
. So, let’s “rise up” and see where part_2
ends up getting used.
to_reduce ← current pos 
$  mysteries to solve 
/ \   
/ \  what is part_2 applied to?  ← current mystery
/ \  what is y? 
part_1 x part_2 y 
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
Now we see part_2
is being passed as a parameter to part_1
. This doesn’t really help us yet, we’re looking for a situation where part_2
has a parameter passed to it (we want part_2
to be on the left side of the function application, here it’s on the right side). The procedure now is to go down and see how it’s used.
to_reduce 
$  mysteries to solve 
/ \   
/ \  what is part_2 applied to?  ← current mystery
/ \  what is y? 
current pos ⭢ part_1 x part_2 y 
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
part_1
is comprised of a function application. The parameter x
of part_1
is part_2
. Let’s go down to see what’s on the left side.
to_reduce 
$  mysteries to solve 
/ \   
/ \  what is part_2 applied to?  ← current mystery
/ \  what is y? 
part_1 x part_2 y 
$ $
/ \ / \
/ \ / \
/   \
x id x pair p $
 / \
x y 3
Finally! x
, which represents part_2
, is on the left side of a function appplication! Now let’s look at what’s on the right side.
to_reduce 
$  mysteries to solve 
/ \   
/ \  what is part_2 applied to?  ← current mystery
/ \  what is y? 
part_1 x part_2 y 
$ $
/ \ / \
/ \ / \
/   \
current pos ⭢ id x pair p $
 / \
x y 3
looks like the id
function is on the right side  this solves our first mystery. part_2
is applied to id
. Let’s adjust what we’ve written down.
to_reduce 
$  mysteries to solve 
/ \   
/ \  what is y?  ← current mystery
/ \ 
part_1 x part_2 y
$ $
/ \ / \
/ \ / \
/   \
current pos ⭢ id x pair p $
 / \
x y 3
We now finally know what y
is, it’s the id
function. With this, we can reduce our to_reduce
function. Remember, to_reduce
involved applying part_1
to part_2
. So the first step is to apply part_1
to part_2
.
to_reduce to_reduce
$ $
/ \ / \
/ \ / \
/ \ / \
part_1 x part_2 y part_2 y id x
$ $ > $ 
/ \ / \ / \ x
/ \ / \ / \
/   \ / 
x id x pair p $ pair $
 / \ / \
x y 3 y 3
In haskell, this would look like this:
id x = x
pair p = (p, p) id x = x id x = x
part_1 x = x id > pair p = (p, p) > pair p = (p, p)
part_2 y = pair (y 3) part_2 y = pair (y 3) part_2 y = pair (y 3)
to_reduce = part_1 part_2 to_reduce = (\x > x id) part_2 to_reduce = part_2 id
 replace part_1 with its  reduce
 definition.
(I added the extra step of replacing part_1
with its definition that we don’t have to do when we visualize it as a tree)
Then the next step is to reduce part_2
!
to_reduce to_reduce
$ $
/ \ / \
/ \ / \
/ \ / \
part_2 y id x pair p $
$  > / \
/ \ x id x 3
/ \ 
/  x
pair $
/ \
y 3
id x = x
pair p = (p, p) > id x = x id x = x
part_2 y = pair (y 3) pair p = (p, p) > pair p = (p, p)
to_reduce = part_2 id to_reduce = (\y > pair (y 3)) id to_reduce = pair (id 3)
 replace part_2 with its
 definition.
Then we just apply id
!
to_reduce to_reduce
$ $
/ \ / \
/ \ > / \
/ \ / \
pair p $ pair p 3
/ \
id x 3

x
id x = x
pair p = (p, p) > pair p = (p, p) > pair p = (p, p)
to_reduce = pair (id 3) to_reduce = pair ((\x > x) 3) to_reduce = pair 3
 replace id with its
 definition.
We have reduced the “virtual redex”! (y 3)
has been converted to its true self, 3
. But the interesting thing is that we’ve reduced the virtual redex in an optimal number of steps. Let’s be more precice. Remember how we were moving around that current position arrow all around the graph earlier? Let’s label all the edges that we visited on the tree.
to_reduce
$
/ \
c/ \b
/ \
part_1 x part_2 y
$ $
/ \ / \
d/ \e / \
/   \
x id x pair p $
 a/ \
x y 3
We started at (y 3)
, so our initial position is the $
connected to the edge labelled a
. You can think of a reduction as combining multiple edges.
to_reduce to_reduce to_reduce
$ $ $
/ \ / \ / \
c/ \b bcd/ \e / \
/ \ / \ / \
part_1 x part_2 y part_2 y id x pair p $
$ $ > $  > abcde/ \
/ \ / \ / \ x id x 3
d/ \e / \ / \ 
/   \ /  x
x id x pair p $ pair $
 a/ \ a/ \
x y 3 y 3
First we reduce the application of part_1
to part_2
, which has the effect of combining the b
, c
, and d
edges, which we’ll call bcd
. Then we reduce the application of part_2
to id
, which combines the a
, bcd
, and e
edges. This is the absolute minimum we have to do to convert the virtual redex (y 3)
to a the redex (id 3)
.
What we’re really done here is described in a very high level way how to 1) find the shortest path between a mysterious element in a virtual redex and 2) use that path to optimally reduce a virtual redex. So, any reduction algorithm that is optimal must:

Compute the normal form of am expression without reducing any path corresponding to a virtual redex more than once.

Compute the normal form of am expression without reducing any uneccsary path.
This concept of virtual redexes is very important for understanding the abstract algorithm. For more information, see part 2 once it comes out!
Written by Andre Popovitch, a Michigan State student who wants to change the world. You should follow him on Twitter