Monday, December 15, 2014

A commentary on 24 days of GHC extensions

Ollie Charles has continued his excellent series of Christmas Haskell blogs. This year he talks about 24 Days of GHC Extensions. His blog posts are an excellent technical introduction to various Haskell extensions. Reading them inspired me to write some non-technical comments about the various extensions; giving a little bit of history and personal comments about them.

Day 1: View Patterns

View patterns have a long history. As far as I know views were first suggested by Phil Wadler in 1987, Views: a way for pattern matching to cohabit with data abstraction. As the title of the paper suggests, it was about being able to pattern match on abstract data types. Variations of Phil's suggestions have been implemented several times (I did it both in LML and for Haskell in hbc). In all these early suggestions the conversion between the concrete type and the view type were implicit, whereas in the ViewPatterns extension the conversion is explicit.

The addition of view patterns was partly spurred by the fact that F# has a very neat way of introducing pattern matching for abstract types, called active patterns. Since Simon Peyton Jones is in the same research lab as Don Syme it's natural that Simon couldn't let F# have this advantage over Haskell. :)

Day 2: Pattern Synonyms

Pattern synonyms is a more recent addition. The first time I remember hearing the idea was this the paper Abstract Value Constructors, and ever since then I wished that Haskell would have something like that. Patterns were one of the few constructs in Haskell that could not be named and reused. The first time they were added to Haskell was in the SHE preprocessor.

At a Haskell hackathon in Cambridge, August 2011, Simon Peyton Jones, Simon Marlow and I met to hash out how they could be added to GHC. I wanted to go beyond simple pattern synonyms. The simplest kind is uni-directional synonyms which can only occur in patterns. The simply bi-directional synonyms can be used both in patterns and expressions, but are limited in what they can do. The explicit bi-directional synonyms allow the full power of both patterns and expressions. With explicit bi-directional synonyms and view patterns we can finally implement views as envisioned by Phil, but now broken down into smaller reusable pieces. The result of our discussions at the hackathon can ge found here. But we were all too busy to implement any of it. All the hard implementation work, and fixing all the additional snags found, was done by Gergő Érdi.

I find this a very exciting extension, and you can take it even further, like Associated Pattern Synonyms in class declarations.

Day 3: Record Wildcards

The record wildcard extension allows you to open a record and get access to all the fields without using qualified names for them. The first time I encountered this idea was in Pascal which as a with-statement. It looks like with expr do stmt where the expr is a record value and inside the stmt all the fields of the record can be accessed directly. The construct later appeared in Ada as well, where it's called use.

So having something similar in Haskell doesn't seem so far fetched. But in was actually the dual, namely to construct values using record wildcard notation that inspired me to make this extension.

In 2006 I was developing the Paradise EDSL which is a DSL embedded in Haskell for describing computations and user interfaces. I wanted a way to make the EDSL less verbose and that's why record wildcards came about. Here's a simple example to illustrate the idea. We want to be able to input a coordinate record.

data Coord = Coord { x :: Double, y :: Double }
    coord :: P Coord
    coord = do
        x <- input 0.0
        y <- input 0.0
        return Coord{..}
This says that x and y are input and that their default value is 0. We need to have an input for every field in the Coord record and at the end we need to construct the record itself. Without the record wildcard the construction would have been Coord{x=x,y=y}. This isn't too bad, but for hundreds of inputs it gets tedious.

I made a quick and dirty implementation of this in GHC, but it was too ugly to get into the official release. Instead Dan Licata reimplemented (and extended) it under Simon PJ's supervision into what we have today.

I'm actually quite ambivalent about this extension. It's incredibly convenient (especially in the pattern form), but it introduces new bound names without an explicit mention of the name in the binder. This makes it harder to read code. The same objection can be made about the Haskell import declaration when it lacks an import list.

Day 4: Bang Patterns

I don't have much to say about BangPatterns. One day they simply appeared in a new GHC version. :)

The Clean language has long has something similar, but in Clean the annotation goes on the type instead of on the pattern. In Haskell it seems like a nice duality to have it on the pattern since there also lazy patterns introduced by ~.

Day 5: Rebindable Syntax

The rebindable syntax is almost like going back the older Haskell reports. They lacked an exact specification of where the identifier introduced by desugaring were bound. Of course, it was resolved so that they always refer to the Prelude identifiers. But when experimenting with other preludes it's very useful to be able to override this. Which is exactly what RebindableSyntax allows you to do.

In my opinion, this extension should be a little different. It ought to be possible to give a module name for where the names should be found. So normally this would be Prelude, but I'd like to be able to say RebindableSyntax=MyPrelude, and then all names introduced by desugaring will be found in MyPrelude (even if they are not in scope).

Day 6: List comprehensions

This bundles together a number of list comprehension extensions.

First, MonadComprehensions, which allows list comprehension syntax for any monad. This is simply going back to Haskell 1.4 where monad comprehensions were introduced. The monad comprehension syntax had been used by Phil Wadler before that. Monad comprehension were removed from 1.4, because they were deemed too confusing for beginners. Monad comprehensions were brought back to GHC by George Giorgidze et al.

The ParallelListComp allows zip like operation to be expression in the comprehension. The idea originates from Galois in the design of Cryptol. John Launchbury, Jeff Lewis, and Thomas Nordin were turning crypto networks into recursive equations and they wanted a nicer notation than using zipWith. (Thanks to Andy Adams-Moran for the history lesson.)

The origins TransformListComp are simple. It's just a way to bring more of the power of SQL into list comprehensions. It's an extension that introduces far too much special syntax for my taste, but the things you can do are quite neat.

Labels:

Saturday, April 05, 2014

Haskell error reporting with locations, update

Since some people (I'm among them) dislike impure features in Haskell I thought I'd present a slight variation on the error location feature that is "pure".

First, the __LOCATION__ variable gets an abstract type. So

  data Location
  __LOCATION__ :: Location
It's defined in the Prelude and always in scope. The type cannot be compared, shown, or anything. There's just one thing that can be done, namely:
  extractLocation :: Location -> IO String
The error function needs a new exception to throw
  data ErrorCallLoc = ErrorCallLoc Location String

  {-# LOCATIONTRANSPARENT error #-}
  error :: String -> a
  error s = throw (ErrorCallLoc __LOCATION__ s)
This means that the location string cannot be used when we throw the error. But it can be used where the error is caught, since this can only be done in the IO monad.

Under the hood the everything is just as before, Location is just a string. It just can't be manipulated except in the IO monad, so we can pretend it's pure.

  newtype Location = Location String
  extractLocation (Location s) = return s
It now looks a lot like Michael Snoyman's proposal.

Friday, April 04, 2014

Haskell error reporting with locations

Error reporting in GHC is not always the nicest. For example, I often develop code by using undefined as a placeholder for code I've not written yet. Here's a simple example:
import System.Environment
  main = do
    args <- getargs
    if null args then
      undefined
     else
      undefined
Running this looks like this:
$ ./H
  H: Prelude.undefined
Which undefined caused that? Looking at the error message we have no idea. Wouldn't it be nice with some location information?

We can actually get location information by using Control.Exception.assert:

import Control.Exception(assert)
  import System.Environment

  main = do
    args <- getargs
    if null args then
      assert False undefined
     else
      assert False undefined
Now running it is much more informative:
$ ./H
  H: H.hs:7:9-14: Assertion failed
How is assert able to report the location? If we dig deep enough we discover that it's because the ghc compiler contains a special hack to recognize this function and give it location information.

A generalized hack

In a Haskell compiler that I've implemented I've taken this compiler hack and extended it so it can be used for any function.  It comes in two parts, location information and location transparent definitions.

__LOCATION__

The __LOCATION__ identifier is always defined and utterly magical. Its value is a string that describes the location of that very identifier. This is the very opposite of a referentially transparent name. In fact it's value varies with where it is placed in the code! So it's definitely not for purists. But I'm a practical man, so I sometimes have resort of the ugliness of reality. And in reality we want to report locations in errors.

Enough philosophy, here's an example:

main = do
    print __LOCATION__
    print   __LOCATION__
And running it prints:
"test/Test.hs:2:11"
  "test/Test.hs:3:13"
And to illustrate the impurity:
main = do
    let loc = __LOCATION__
    print loc
    print loc
And running this:
"test/Test.mu:2:15"
  "test/Test.mu:2:15"

Location transparency

The __LOCATION__ identifier gives the location of itself. This is of little use on its own. Imagine the definition we could give for undefined. Somewhere in the Prelude module it could say something like
undefined = error ("undefined: " ++ __LOCATION__)
But if we use this all that it will tell us is where the definition of undefined is, not where it was used.

To get the point of use instead of the definition I've introduced location transparent definitions. In a location transparent definition the __LOCATION__ identifier will not refer to its own position, but to the position of the reference to the definition. Location transparency is introduced with a pragma.

{-# LOCATIONTRANSPARENT undefined #-}
  undefined = error ("undefined: " ++ __LOCATION__)
With this definition our initial example looks like this when we run it:
undefined: test/H.hs:6:9
In fact, the real definition of undefined doesn't look like that. The __LOCATION__ identifier is only used in the definition of error, so it looks something like this:
{-# LOCATIONTRANSPARENT error #-}
  error :: String -> a
  error s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))

  {-# LOCATIONTRANSPARENT undefined #-}
  undefined = error "undefined"
Since both error and undefined are transparent any use of undefined will be reported with the location of the use.

Furthermore, we can make a few more functions location transparent, e.g.,

{-# LOCATIONTRANSPARENT head #-}
  head :: [a] -> a
  head [] = error "Empty list"
  head (x:xs) = x
A simple example:
main = putStr (head [])
Which will print:
test/Head.hs:1:16: Empty list
which is the location where head was called.

Implementation

There are different ways to implement this feature, and I'm going to sketch two of them.

First: Every function that has the LOCATIONTRANSPARENT pragma will be inlined at the point of use, and the __LOCATION__ identifier in the inlined code will be updated to reflect the call site. The definitions must be processed in a bottom-up fashion for this to work. It's fairly simple to implement, but will cause some code bloat due to inlining.

Second: Every function that has LOCATIONTRANSPARENT pragma will be rewritten (by the compiler) to have an extra location argument, and each use of this function will be rewritten to pass in the current location. For example (using $$f for the location version of f):

main = putStr ($$head __LOCATION__ [])

  $$head __LOCATION__ [] = $$error __LOCATION__ "Empty list"
  $$head __LOCATION__ (x:xs) = x
  $$error __LOCATION__ s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))
This should be fairly straightforward to implement, but I've not tried it. (It's somewhat like dynamic binding, so maybe ghc could reuse that mechanism for locations.)

And, of course, the global __LOCATION__ identifier has to be recognized by the compiler and replaced by a string that is its location.

Conclusion

I implemented the __LOCATION__ hack quite a while ago, and I like the much improved reporting of error locations. I hope someone will add it to ghc as well.

Labels:

Thursday, April 03, 2014

A small Haskell extension

The extension

In Haskell you can give a type to an expression by writing expr ::  type.  To an untrained eye the :: looks just like an infix operator, even if it is described as a special syntactical construct in the Haskell report.  But let's think about it as an infix operator for a moment.

For an infix operator you you can for a section, i.e., a use of the operator with one operand left out.  For instance (* 2) leaves out the first operand, and Haskell defines this to be the same as (\ x -> x * 2).  Regarding :: as an operator we should be able to write (:: type) and it should have the obvious meaning (\ x -> x :: type).

I suggest, and I plan sending the haskell-prime mailing list, Haskell should adopt this small extension.

Why?

First, the extension is very light weight and has almost no extra intellectual weight for anyone learning Haskell.  I'd argue it makes the language simpler because it allows :: to be treated more like an infix operator.  But without use cases this would probably not be enough of an argument.

Example 1

We want to make a function, canonDouble, that takes a string representing a Double and changes it to the standard Haskell string representing this Double.  E.g. canonDouble "0.1e1" == "1.0".  A first attempt might look like this:

  canonDouble :: String -> String
  canonDouble = show . read         -- WRONG!

This is, of course, wrong since the compiler cannot guess that the type between read and show should be a Double.  We can convey this type information in different ways, e.g.:

  canonDouble :: String -> String
  canonDouble = show . asDouble . read
    where asDouble :: Double -> Double
          asDouble x = x

This is somewhat clumsy.  Using my proposed extension we can instead write:

  canonDouble :: String -> String
  canonDouble = show . (:: Double) . read

This has the obvious meaning, and succinctly describes what we want.

Example 2

In ghc 7.8 there is a new, better implementation of Data.Typeable.  It used to be (before ghc 7.8) that to get a TypeRep for some type you would have to have a value of that type.  E.g., typeOf True gives the TypeRep for the Bool type.  If we don't have a value handy of the type, then we will have to make one, e.g., by using undefined.  So we could write typeOf (undefined :: Bool).

This way of using undefined is rather ugly, and relies on non-strictness to work.  Ghc 7.8 add a new, cleaner way of doing it.

  typeRep :: proxy a -> TypeRep

The typeRep function does not need an actual value, but just a proxy for the value.  A common proxy is the Proxy type from Data.Proxy:

  data Proxy a = Proxy

Using this type we can now get the TypeRep of a Bool by writing typeRep (Proxy :: Proxy Bool).  Note that in the type signature of typeRep the proxy is a type variable.  This means we can use other values as proxies, e.g., typeRep ([] :: [Bool]).

We can in fact use anything as a proxy that has a structure that unifies with proxy a.  For instance, if we want a proxy for the type T we could use T -> T, which is the same as (->) T T.  The (->) T part makes of it is the proxy and the last T makes up the a.

The extension I propose provides an easy way to write a function of type T -> T, just write (:: T).  So to get a TypeRep for Bool we can simply write typeRep (:: Bool).  Doesn't that look (deceptively) simple?

In fact, my driving force for coming up with this language extension was to get an easy and natural way to write type proxies, and I think using (:: T) for a type proxy is a as easy and natural as it gets (even if the reason it works is rather peculiar).

Implementation

I've implemented the extension in one Haskell compiler and it was very easy to add and it works as expected.  Since it was so easy, I'll implement it for ghc as well, and the ghc maintainers can decide if the want to merge it.  I suggest this new feature is available using the language extension name SignatureSections.

Extensions

Does it make sense to do a left section of ::?  I.e., does (expr ::) make sense?  In current Haskell that does not make sense, since it would be an expression that lacks an argument that is a type.  Haskell doesn't currently allow explicit type arguments, but if it ever will this could be considered.

With the definition that (:: T) is the same as (\ x -> x :: T) any use of quantified or qualified types as T will give a type error.  E.g., (:: [a]), which is (\ x -> x :: [a],  is a type error.  You could imagine a different desugaring of (:: T), namely (id :: T -> T).  Now (:: [a]) desugars to (id :: [a] -> [a]) which is type correct.  In general, we have to keep quantifiers and qualifiers at the top, i.e., (:: forall a . a) turns into (id :: forall a . a -> a).

Personally, I'm not convinced this more complex desugaring is worth the extra effort.

Labels:

Saturday, July 09, 2011

Impredicative polymorphism, a use case

In a recent question on stackoverflow I made a comment about how Haskell can be considered a good imperative language because you can define abstractions to make it convenient. When I was going to make my point by implementing a simple example of it I found that what I wanted to do no longer works in ghc-7.0.4. Here's a simple example of what I wanted to do (which works in ghc-6.12.3). It's a simple library that makes Haskell code look a bit like Python.

{-# LANGUAGE ExtendedDefaultRules, TupleSections #-}
module Main where
import qualified Prelude
import Boa

last_elt(xs) = def $: do
    assert xs "Empty list"
    lst <- var xs              -- Create a new variable, lst
    ret <- var (xs.head)
    while lst $: do
        ret #= lst.head        -- Assign variable ret
        lst #= lst.tail
    return ret

first_elt(xs) = def $: do
    l <- var xs
    l.reverse                  -- Destructive reverse
    return (last_elt(l))

factorial(n) = def $: do
    assert (n<=0) "Negative factorial"
    ret <- var 1
    i <- var n
    while i $: do
        ret *= i
        i -= 1
    return ret

test = def $: do
    print "Hello"
    print ("factorial 10 =", factorial(10))

main = do
    test
    l <- varc [1, 2, 3]
    print ("first and last:",)
    print (first_elt(l),)
    print (last_elt(l))

On the whole it's pretty boring, except for one thing. In imperative languages there is (usually) a disctinction between l-values and r-values. An l-value represent a variable, i.e., something that can be assigned, whereas an r-value is simply a value. So in Python, in the statement x = x + 1 the x on the left is an l-value (it's being assigned), whereas on the right it's an r-value. You can use the same notation for both in most imperative languages. If you want to do the same in Haskell you have (at least) two choices. First you can unify the concepts of l-value and r-value and have a runtime test that you only try to assign variables. So, e.g., 5 = x would type check but have a runtime failure. I will not dwell further on this since it's not very Haskelly.
Instead, we want to have l-values and r-values being statically type checked. Here's the interesting bit of the types for a simple example.
 
data LValue a
data RValue a
instance (Num a) => Num (RValue a)

class LR lr
instance LR RValue
instance LR LValue

var :: RValue a -> IO (forall lr . (LR lr) => lr a)
(#=) :: LValue a -> RValue a -> IO ()

foo = do
    x <- var 42
    x #= x + 1

We have two type constructors LValue and RValue representing l-values and r-values of some type a. The r-values is an instance of Num. Furthermore, the class LR where the type is either LValue or RValue.
The var function creates a new variable given a value. The return type of var is the interesting part. It says that the return value is polymorphic; it can be used either as an l-value or as r-value. Just as we want.
The assignment operator, (#=), takes an l-value, an r-value, and returns nothing.

So in the example we expect x to have type forall lr . (LR lr) => lr a, in which case the assignment will type check.
If we try to compile this we get
    Illegal polymorphic or qualified type:
      forall (lr :: * -> *). LR lr => lr a
    Perhaps you intended to use -XImpredicativeTypes
    In the type signature for `var':
      var :: RValue a -> IO (forall lr. LR lr => lr a)

The problem is that universal quantification is not normally allowed as the argument to a type constructor. This requires the impredicative polymorphism extension to ghc. If we turn it on it compiles fine in ghc-6.12.3.
But, with ghc-7.0.4 we get
    Couldn't match expected type `LValue a0'
                with actual type `forall (lr :: * -> *). LR lr => lr a1'
    In the first argument of `(#=)', namely `x'
    In the expression: x #= x + 1

I can't really explain the rational behind the change in the ghc type system (Simon say it's simpler now), but I feel this has really made ghc worse for defining DSELs. When you define a DSEL you usually use the do-notation for the binding construct in the embedded language. This is the only programmable binding construct (by overloading (>>=)), so there is little choice. With the change in ghc-7 it means you can no longer make DSELs with a polymorhic binding construct (like I wanted here), because the binder seems to be monomorphic now

Please Simon, can we have polymorphic do bindings back?

Monday, May 02, 2011

More points for lazy evaluation

In a recent blog post Bob Harper shows one use of laziness, but I think he misses the real import points of laziness. So I will briefly enumerate what I think are the important points of lazyness (excuse me for not coming up with any kind of pun about points).

First, I'd like to say that I don't think strict or lazy matters that much in practice; they both work fine. I have programmed in regular non-strict Haskell as well as strict Haskell, and most things carry over well to a strict Haskell.  Furthermore, when I say strict language I mean a strict language with at least non-termination as an effect; total languages is another matter.

Lazy bindings

I like to tell Haskell beginners that any subexpression can be named and "pulled out", modulo name capture, of course. For instance
    ... e ...
is the same as
    let x = e
    in  ... x ...

The key thing is that
    ... e ... e ...
is the same as
    let x = e
    in  ... x ... x ...
so that common subexpressions can be given a name.

This is (in general) just wrong in a strict language, of course. Just take the simple example
    if c then error "BOO!" else 0
Which is not the same as
    let x = error "BOO!"
    in  if c then x else 0

In this case you can easily fix the problem by delaying the computation with a lambda (a common theme).
    let x () = error "BOO!"
    in  if c then x () else 0

But for a slightly more complicated example this simple technique goes wrong. Consider
    map (\ a -> a + expensive) xs
where expensive does not depend on a. In this case you want to move the expensive computation out of the loop (cf. loop invariant code motion in imperative languages). Like so
    let x = expensive
    in  map (\ a -> a + x) xs
In a lazy language x will be evaluated exactly zero times or once, just as we want. Using the delaying trick doesn't work here:
    let x () = expensive
    in  map (\ a -> a + x ()) xs
since expensive will get evaluated once for every list element.

This is easy enough to remedy, by introducing an abstraction for lazy computations (which will contain an assignment to compute the value just once). The signature for the abstract type of lazy values is something like
    data Lazy a
    delay :: (() -> a) -> Lazy a
    force :: Lazy a -> a
Note that the delay needs to take a function to avoid the a being evaluated early.
(This is probably what Bob would name a benign effect and is easily programmed using unsafePerformIO, which means it needs careful consideration.)

And so we get
    let x = delay (\ () -> expensive)
    in  map (\ a -> a + force x) xs
This isn't exactly pretty, but it works fine. In a language with macros the ugliness can be hidden better.

Lazy functions

Even strict languages like ML and C have some lazy functions even if they don't call them that, like SML's if, andthen, and orelse. You really need the if construct to evaluate the condition and then one of the branches depending on the condition. But what if I want to define my own type with the same kind of functions? In ML I can't, in C I would have to use a macro.

The ability to define new functions that can be used as control constructs is especially important when you want to design embedded domain specific languages. Take the simple example of the when (i.e., one-arm if) function in Haskell.
    when :: (Monad m) => Bool -> m () -> m ()
A quite common use of this function in monadic code is to check for argument preconditions in a function, like
    f x = do
        when (x < 0) $
            error "x must be >= 0"
        ...
If the when function is strict this is really bad, of course, since the call to error will happen before the when is called.

Again, one can work around this by using lazy values, like
    myAnd :: MyBool -> Lazy MyBool -> MyBool
    ...
    ... myAnd x (delay (\ () -> y)) ...
But in my opinion, this is too ugly to even consider. The intent of the function is obscured by the extra noise to make the second argument lazy.

I think every language needs a mechanism for defining some form of call-by-name functions. And many languages have this in the form of macros (maybe built in like in Lisp, or as a preprocessor like C).
If a language cannot define lazy function it simply lacks in abstraction power. I think any kind of fragment of the language should be nameable and reusable. (Haskell lacks this ability for patterns and contexts; a wart in the design.) So if you notice a repeated expression pattern, like
    if c then t else False
and cannot give this a name, like
    and c t = if c then t else False
and then use it with the same effect as the orginal expression, well, then your language is lacking.

For some language constructs the solution adopted by Smalltalk (and later Ruby), i.e., a very lightweight way on constructing closures is acceptable. So, for instance, I could accept writing
    ... myAnd x {y} ...

(In SML you could make something using functors, but it's just too ugly to contemplate.)

Lazy constructors

Lazy constructors were sort of covered in what Bob claimed to be the point of laziness, so I'll just mention them for completeness.  Sometimes you need them, but in my experience it's not very often.

Cyclic data structures

This is related to the last point.

Sometimes you really want cyclic data structures. An example are the Haskell data types in Data.Data that describe data types and constructors. A data type descriptor needs to contain a list of its constructors and a constructor descriptor needs to contain the data type descriptor.
In Haskell this can be described very naturally by having the two descriptors reference each other.
In SML this is not possible. You will have to break the cycle by somthing like a reference (or a function).
In OCaml you can define cyclic data structures in a similar fashion to Haskell, so this isn't really a problem with strict languages, but rather a feature that you can have if you like. Of course, being able to define cyclic data leads to non-standard elements in your data types, like
    data Nat = Zero | Succ Zero
    omega :: Nat
    omega = Succ omega
So having the ability to define cyclic data structures is a double edged sword.
I find the lack of a simple way to define cyclic data a minor nuisance only.

Reuse

I've saved my biggest gripe of strict evaluation for last. Strict evaluation is fundamentally flawed for function reuse.
What do I mean? I will illustrate with and example.
Consider the any function is Haskell:
any :: (a -> Bool) -> [a] -> Bool
any p = or . map p

It's quite natural to express the any function by reusing the
map and or functions.  Unfortunately, it doesn't
behave like we would wish in a strict language.  The any function should scan the list from the head forwards and as soon as an
element that fulfills the predicate is found it should return true and stop
scanning the list.  In a strict language this would not happen, since
the predicate will be applied to every element before the or
examines the elements.
So we are forced to manually fuse the two functions, doing so we get:
any :: (a -> Bool) -> [a] -> Bool
any p = foldr False (\ x r -> p x || r)

or :: [Bool] -> Bool
or = foldr False (||)


foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)


But the misery doesn't end here. This still doesn't do the right thing, because the strict language will recurse all the way down the list since it will call foldr before f. So we either have to fuse again, or invent a new version of foldr that delays the recursive call.
One more fusion gets us to
any p []     = False
any p (y:ys) = y || any p ys

So where's the function reuse?  Nowhere in sight.


With strict evaluation you can no longer with a straight face tell people: don't use recursion, reuse the recursion patterns in map, filter, foldr, etc. It simply doesn't work (in general).

Using macros doesn't really save us this time, because of the recursive definitions. I don't really know of any way to fix this problem short of making all (most?) functions lazy, because the problem is pervasive.  I.e., in the example it would not be enough to fix foldr; all the functions involved need to be lazy to get the desired semantics.

I find this the biggest annoyance with strict evaluation, but at the same time it's just an annoyance, because you can always rewrite the code to make it work. But strict evaluation really, fundamentally stops you from reusing functions the way you can with laziness.

As an aside, the definition of any doesn't work in SML for another reason as well, namely the value restriction. But that's just a language wart, like the monomorphism restriction in Haskell.

Complexity

Another complaint Bob Harper had about lazy evaluation is the difficulty of finding out the complexity of functions. I totally agree that the space complexity is very messy in a lazy language. For (sequential) time complexity I don't see any need to worry.
If strict a function has O(f(n)) complexity in a strict language then it has complexity O(f(n)) in a lazy language as well.  Why worry? :)

Summing up

One of the most important principles in all software design is DRY, i.e., Don't Repeat Yourself. This means that common patterns that we can find in a program should be abstractable so you can reuse them. In a lazy language you can use functions for abstracting control structures, which a strict language does not permit (at least not in a convenient way). This can be mitigated by providing some other abstraction mechanism, like macros (hopefully some kind of sane macros).
For the reasons above, I find it more convenient to program in a lazy language than a strict one. But a strict language with the ability to define lazy bindings, lazy functions, and lazy constructors is good enough that I find it usable.

Wednesday, April 20, 2011

Ugly memoization

Here's a problem that I recently ran into. I have a function taking a string and computing some value. I call this function a lot, but a lot of the time the argument has occurred before. The function is reasonably expensive, about 10 us. Only about 1/5 of the calls to the function has a new argument.

So naturally I want to memoize the function. Luckily Hackage has a couple packages for memoization. I found data-memocombinators and MemoTrie and decided to try them. The basic idea with memoization is that you have a function like

  memo :: (a->b) -> (a->b)

I.e., you give a function to memo and you get a new function of the
same type back.  This new function behaves like the original one, but
it remembers every time it is used and the next time it gets the same
argument it will just return the remembered result.
This is only safe in a pure language, but luckily Haskell is pure.
In an imperative language you can use a mutable memo table that stores all the argument-result pairs and updates the memo table each time the function is used. But how is it even possible to implement that in a pure language? The idea is to lazily construct the whole memo table in the call to memo, and it will then be lazily filled in.
Assume that all values of the argument type a can be enumerated by the method enumerate, we could then write memo like this:

  memo f =
      let table = [ (x, f x) | x <- enumerate ]
      in  \ y -> let Just r = lookup y table in r

Note how the memo table is constructed given just f, and this memo
table is then used in the returned function.
The type of this function would be something like

  memo (Enumerate a, Eq a) => (a->b) -> (a->b)

assuming that the class Enumerate has the magic method enumerate.

This just a very simplified example, if you tried to use this it would be terrible because the returned function does linear lookup in a list. Instead we want some kind of search tree, which is what the two packages I mention implement. The MemoTrie package does this in a really beautiful way, I recommend reading Conal's blog post about it.
OK, enough preliminaries. I used criterion to perform the benchmarking, and I tried with no memoization (none), memo-combinators (comb), and MemoTrie (beau). I had a test function taking about 10us, and then i called this functions with different number of repeated arguments: 1, 2, 5, and 10. I.e., 5 means that each argument occurred 5 times as the memoized function was called.

1 2 5 10
none 10.7 10.7 10.7 10.7
comb 62.6 52.2 45.8 43.4
beau 27.6 17.0 10.4 8.1

So with no memoization the time per call was 10.7 us all the time, no surprise there. With the memo combinators it was much slower than no memoization; the overhead for looking something up is bigger than the cost of computing the result. So that was a failure. The MemoTrie does better, at about an argument repetition of five it starts to break even, and at ten it's a little faster to memoize.

Since I estimated my repetition factor in the real code to be about five even the fastest memoization would not be any better then recomputation. So now what? Give up? Of course not! It's time to get dirty.

Once you know a function can be implemented in a pure way, there's no harm in implementing the same function in an impure way as long as it presents the pure interface. So lets write the memo function the way it would be done in, e.g., Scheme or ML. We will use a reference to hold a memo table that gets updated on each call. Here's the code, with the type that the function gets.

import Data.IORef
import qualified Data.Map as M

memoIO :: (Ord a) => (a -> b) -> IO (a -> IO b)
memoIO f = do
  v <- newIORef M.empty
  let f' x = do
        m <- readIORef v
        case M.lookup x m of
          Nothing -> do let { r = f x }; writeIORef v (M.insert x r m); return r
          Just r  -> return r
  return f'

The memoIO allocated a reference with an empty memo table.
We then define a new function, f', which when it's called
with get the memo table and look up the argument.  If the argument is
in the table then we just return the result, if it's not then we
compute the result, store it in the table, and return it.
Good old imperative programming (see below why this code is not good
imperative code).

But, horror, now the type is all wrong, there's IO in two places. The function we want to implement is actually pure. So what to do? Well, if you have a function involving the IO type, but you can prove it is actually pure, then (and only then) you are allowed to use unsafePerformIO.

I'll wave my hands instead of a proof (but more later), and here we go

  memo :: (Ord a) => (a -> b) -> (a -> b)
  memo f = let f' = unsafePerformIO (memoIO f) in \ x -> unsafePerformIO (f' x)

Wow, two unsafePerformIO on the same line.  It doesn't get
much less safe than that.
Let's benchmark again:

1 2 5 10
none 10.7 10.7 10.7 10.7
comb 62.6 52.2 45.8 43.4
beau 27.6 17.0 10.4 8.1
ugly 13.9 7.7 3.9 2.7

Not too shabby, using the ugly memoization is actually a win already at two, and just a small overhead if the argument occurs once.  We have a winner!

No so fast, there's

A snag

My real code can actually be multi-threaded, so the memo function had better work in a multi-threaded setting. Well, it doesn't. There's no guarantee about readIORef and writeIORef when doing multi-threading.
So we have to rewrite it. Actually, the code I first wrote is the one below; I hardly ever use IORef because I want it to work with multi-threading.

memoIO f = do
  v <- newMVar M.empty
  let f' x = do
        m <- takeMVar v
        case M.lookup x m of
          Nothing -> do let { r = f x }; putMVar v (M.insert x r m); return r
          Just r  -> do                  putMVar v m;                return r
  return f'

So now we use an MVar instead.  This makes it thread safe.
Only one thread can execute between the takeMVar and the
putMVar.  This guarantees than only one thread can update the
memo table at a time.  If two threads try at the same time one has to
wait a little.  How long?  The time it takes for the lookup, plus some
small constant.  Remember that Haskell is lazy, the the (f x)
is not actually computed with the lock held, which is good.
So I think this is a perfectly reasonable memoIO. And we can do the same unsafe trick as before and make it pure. Performance of this version is the same as with the IORef

Ahhhh, bliss. But wait, there's

Another snag

That might look reasonable, but in fact the memo function is broken now. It appears to work, but here's a simple use that fails

  sid :: String ->; String
  sid = memo id

  fcn s = sid (sid s)

What will happen here?  The outer call to sid will execute
the takeMVar and then do the lookup.  Doing the lookup with
evaluate the argument, x.  But this argument is another call
to sid, this will try to execute the takeMVar.
Disaster has struck, deadlock.

What happened here? The introduction of unsafePerformIO ruined the sequencing guaranteed by the IO monad that would have prevented the deadlock if we had used memoIO. I got what I deserved for using unsafePerformIO.

Can it be repaired? Well, we could make sure x is fully evaluated before grabbing the lock. I settled for a different repair, where the lock is held in a shorter portion of the code.

memoIO f = do
  v <- newMVar M.empty
  let f' x = do
        m <- readMVar v
        case M.lookup x m of
          Nothing -> do let { r = f x }; m <- takeMVar v; putMVar v (M.insert x r m); return r
          Just r  -> return r
  return f'

This solution has its own problem.  It's now possible for several threads
to compute (f x) for the same x and the result of
all but one of those will be lost by overwriting the table.  This is a
price I'm willing to pay for this application.


Moral

Yes, you can use imperative programming to implement pure functions. But the onus is on you to prove that it is safe. This is not as easy as you might think. I believe my final version is correct (with the multiple computation caveat), but I'm not 100% sure.