Chapter 8. More work with typeclasses

Table of Contents

More helpful errors
Making an instance with a type synonym
Living in an open world
When do overlapping instances cause problems?
Relaxing the rules for overlapping instances
How does show work for strings?
How to give a type a secret identity
Differences between data and newtype declarations
Summary: the three ways of naming types
JSON typeclasses without overlapping instances
Exercises
The dreaded monomorphism restriction

The JValue type that we introduced in the section called “Representing JSON data in Haskell” is not especially easy to work with. Here is a truncated and tidied snippet of some real JSON data, produced by a well known search engine.

{
  "query": "awkward squad haskell",
  "estimatedCount": 3920,
  "moreResults": true,
  "results":
  [{
    "title": "Simon Peyton Jones: papers",
    "snippet": "Tackling the awkward squad: monadic input/output ...",
    "url": "http://research.microsoft.com/~simonpj/papers/marktoberdorf/",
   },
   {
    "title": "Haskell for C Programmers | Lambda the Ultimate",
    "snippet": "... the best job of all the tutorials I've read ...",
    "url": "http://lambda-the-ultimate.org/node/724",
   }]
}

And here's a further slimmed down fragment of that data, represented in Haskell.

import SimpleJSON

result :: JValue
result = JObject [
  ("query", JString "awkward squad haskell"),
  ("estimatedCount", JNumber 3920),
  ("moreResults", JBool True),
  ("results", JArray [
     JObject [
      ("title", JString "Simon Peyton Jones: papers"),
      ("snippet", JString "Tackling the awkward ..."),
      ("url", JString "http://.../marktoberdorf/")
     ]])
  ]

Because Haskell doesn't natively support lists that contain types of different value, we can't directly represent a JSON object that contains values of different types. Instead, we must wrap each value with a JValue constructor. This limits our flexibility: if we want to change the number 3920 to a string "3,920", we must change the constructor that we use to wrap it from JNumber to JString.

Haskell's typeclasses offer a tempting solution to this problem.

type JSONError = String

class JSON a where
    toJValue :: a -> JValue
    fromJValue :: JValue -> Either JSONError a

instance JSON JValue where
    toJValue = id
    fromJValue = Right

Now, instead of applying a constructor like JNumber to a value to wrap it, we apply the toJValue function. If we change a value's type, the compiler will choose a suitable implementation of toJValue to use with it.

We also provide a fromJValue function, which attempts to convert a JValue into a value of our desired type.

More helpful errors

The return type of our fromJValue function uses the predefined Either type. Like Maybe, this type is predefined for us, and we'll often use it to represent a computation that could fail.

While Maybe is useful for this purpose, it gives us no information if a failure occurs: we literally have Nothing. The Either type has a similar structure, but instead of Nothing, the “something bad happened” constructor is named Left, and it takes a parameter.

data Maybe a = Nothing
             | Just a
               deriving (Eq, Ord, Read, Show)

data Either a b = Left a
                | Right b
                  deriving (Eq, Ord, Read, Show)

Quite often, the type we use for the a parameter value is String, which lets us return an error message if something goes wrong. To see how we use the Either type in practice, let's look at a simple instance of our typeclass.

instance JSON Bool where
    toJValue = JBool
    fromJValue (JBool b) = Right b
    fromJValue _ = Left "not a JSON boolean"

Making an instance with a type synonym

The Haskell 98 standard does not allow us to write an instance of the following form, even though it seems perfectly reasonable.

instance JSON String where
    toJValue = JString
    fromJValue (JString s) = Right s
    fromJValue _ = Left "not a JSON string"

Recall that String is a synonym for [Char], which in turn is the type [a] where Char is substituted for the type parameter a. According to Haskell 98's rules, we are not allowed to supply a type in place of a type parameter when we write an instance. In other words, it would be legal for us to write an instance for [a], but not for [Char].

While GHC follows the Haskell 98 standard by default, we can relax this particular restriction by placing a specially formatted comment at the top of our source file.

{-# LANGUAGE TypeSynonymInstances #-}

This comment is a directive to the compiler, called a pragma, which tells it to enable a language extension. The TypeSynonymInstances language extension makes the above code legal. We'll encounter a few other language extensions in this chapter, and a handful more later in this book.

Living in an open world

Haskell's typeclasses are intentionally designed to let us create new instances of a typeclass whenever we like.

doubleToJValue :: (Double -> a) -> JValue -> Either JSONError a
doubleToJValue f (JNumber v) = Right (f v)
doubleToJValue _ _ = Left "not a JSON number"

instance JSON Int where
    toJValue = JNumber . realToFrac
    fromJValue = doubleToJValue round

instance JSON Integer where
    toJValue = JNumber . realToFrac
    fromJValue = doubleToJValue round

instance JSON Double where
    toJValue = JNumber
    fromJValue = doubleToJValue id

We can add new instances anywhere; they are not confined to the module where we define a typeclass. This feature of the typeclass system is referred to as its open world assumption. If we could say “the following are the only instances of this typeclass that can exist”, we could have a closed world.

One useful instance we'd like to be able to write is to turn a list into what JSON calls an array. We won't worry about implementation details just yet, so let's use undefined as the bodies of the instance's methods.

instance (JSON a) => JSON [a] where
    toJValue = undefined
    fromJValue = undefined

It would also be convenient if we could turn a list of name/value pairs into a JSON object.

instance (JSON a) => JSON [(String, a)] where
    toJValue = undefined
    fromJValue = undefined

When do overlapping instances cause problems?

If we put these definitions into a source file and load them into ghci, everything initially seems fine.

ghci> :load BrokenClass

BrokenClass.hs:9:7:
    Could not find module `SimpleJSON':
      Use -v to see a list of the files searched for.
Failed, modules loaded: none.

However, once we try to use the list-of-pairs instance, we run into trouble.

ghci> toJValue [("foo","bar")]

<interactive>:1:0: Not in scope: `toJValue'

This problem of overlapping instances is a consequence of Haskell's open world assumption. Here's a simpler example that makes it clearer what's going on.

class Borked a where
    bork :: a -> String

instance Borked Int where
    bork = show

instance Borked (Int, Int) where
    bork (a, b) = bork a ++ ", " ++ bork b

instance (Borked a, Borked b) => Borked (a, b) where
    bork (a, b) = ">>" ++ bork a ++ " " ++ bork b ++ "<<"

We have two instances of the typeclass Borked for pairs: one for a pair of Ints and another for a pair of anything else that's Borked.

Suppose that we want to bork a pair of Int values. To do so, the compiler must choose an instance to use. Because these instances are right next to each other, it may seem that it could simply choose the more specific instance.

However, GHC is conservative by default, and insists that there must be only one possible instance that it could use. It will thus report an error if we try to use bork.

[Note]Note

As we mentioned earlier, we can scatter instances of a typeclass across several modules. GHC does not complain about the mere existence of overlapping instances. Instead, it only complains when we try to use a method of the affected typeclass, when it is forced to make a decision about which instance to use.

Relaxing the rules for overlapping instances

GHC supports a useful language extension, OverlappingInstances: when there are multiple overlapping instances to choose from, this extension causes the compiler to pick the most specific one.

We frequently use this extension together with TypeSynonymInstances, so that we can provide an instance for String and an instance for other types of list. Here's an example.

{-# LANGUAGE TypeSynonymInstances, OverlappingInstances #-}

import Data.List

class Foo a where
    foo :: a -> String

instance Foo a => Foo [a] where
    foo = concat . intersperse ", " . map foo

instance Foo Char where
    foo c = [c]

instance Foo String where
    foo = id

With the OverlappingInstances extension enabled, GHC will still reject code if it finds more than one most specific instance.

[Note]When to use the OverlappingInstances extension

Here's an important point: GHC treats OverlappingInstances as affecting the declaration of an instance, not a location where we use the instance. In other words, when we define an instance that we wish to allow to overlap with another instance, we must enable the extension for the module that contains the definition. When it compiles the module, GHC will mark that instance as “can be overlapped with other instances”.

Once we import this module and use the instance, we won't need to enable OverlappingInstances in the importing module: GHC will already know that the instance was marked as “okay to overlap” when it was defined.

This behaviour is useful when we are writing a library: it lets us choose to create overlappable instances, without forcing clients of our library to enable language extensions in order to use it.

How does show work for strings?

The OverlappingInstances and TypeSynonymInstances language extensions are specific to GHC, and by definition were not present in Haskell 98. However, the familiar Show typeclass from Haskell 98 somehow renders a list of Char differently from a list of Int. It achieves this via a clever, but simple, trick.

The Show class defines both a show method and a showList method. The default implementation of showList renders elements using square brackets and commas. The instance of Show for [a] uses showList. The instance of Show for Char simply provides a special implementation of showList that uses double quotes and escapes non-ASCII-printable characters.

At least sometimes, then, we can avoid the need for the OverlappingInstances extension with a little bit of lateral thinking.

How to give a type a secret identity

In addition to the familiar data keyword, Haskell provides us with another way to create a new type, via newtype.

data DataInt = D Int
    deriving (Eq, Ord, Show)

newtype NewtypeInt = N Int
    deriving (Eq, Ord, Show)

The purpose of a newtype declaration is to rename an existing type, giving it a distinct identity. As we can see, it is similar in appearance to a type declaration using the data keyword.

When we declare a newtype, we can choose to make any of the underlying type's typeclass instances available. Here, we've elected to make NewtypeInt provide Int's instances for Eq, Ord and Show. As a result, we can compare and print values of type NewtypeInt.

ghci> N 1 < N 2
True

However, since we are not exposing Int's Num or Integral instances, values of type NewtypeInt are not numbers. For instance, we can't add them.

ghci> N 313 + N 37

<interactive>:1:0:
    No instance for (Num NewtypeInt)
      arising from a use of `+' at <interactive>:1:0-11
    Possible fix: add an instance declaration for (Num NewtypeInt)
    In the expression: N 313 + N 37
    In the definition of `it': it = N 313 + N 37

As with the data keyword, we can use a newtype's value constructor to create a new value, or to pattern match on an existing value.

Differences between data and newtype declarations

The newtype keyword exists to give an existing type a new identity, and it is more restricted in how we can use it than the data keyword. Specifically, a newtype can only have one value constructor, and that constructor must have exactly one field.

-- ok: any number of fields and constructors
data TwoFields = TwoFields Int Int

-- ok: exactly one field
newtype Okay = ExactlyOne Int

-- ok: type parameters are no problem
newtype Param a b = Param (Either a b)

-- ok: record syntax is fine
newtype Record = Record {
      getInt :: Int
    }

-- bad: no fields
newtype TooFew = TooFew

-- bad: more than one field
newtype TooManyFields = Fields Int Int

-- bad: more than one constructor
newtype TooManyCtors = Bad Int
                     | Worse Int

Beyond this, there's another important difference between data and newtype. A type created with the data keyword has some runtime book-keeping overhead, to record which constructor a value was created with. A newtype value, on the other hand, can only have one constructor, and so does not need this overhead. This makes it more space- and time-efficient at runtime.

Because a newtype's constructor is used only at compile time and doesn't have a runtime representation, pattern matching on undefined behaves differently.

To understand the difference, let's first review what we might expect with a normal datatype. We are already familiar with the idea that if undefined is evaluated at runtime, it causes a crash.

ghci> undefined
*** Exception: Prelude.undefined

Here is a pattern match where we construct a DataInt using the D constructor, and put undefined inside.

ghci> case D undefined of D _ -> 1
1

Since our pattern matches against the constructor but doesn't inspect the payload, the undefined remains unevaluated and does not cause a crash.

In this example, we're not using the D constructor, so the unprotected undefined gets evaluated when the pattern match occurs, and we crash.

ghci> case undefined of D _ -> 1
*** Exception: Prelude.undefined

When we use the N constructor, we get the same behaviour as we did with the D constructor: no crash.

ghci> case N undefined of N _ -> 1
1

The crucial difference arises when we get rid of the N constructor from the expression, and match against an unprotected undefined.

ghci> case undefined of N _ -> 1
1

We don't crash! Because there's no constructor present at runtime, matching against N _ is actually equivalent to matching against the plain wild card _: since this always matches, the expression doesn't need to be evaluated.

[Tip]Another perspective on newtype constructors

Even though we use a newtype type's constructor in the same way as a data type's constructor, it really exists just to coerce a value between the two types.

In other words, when we apply the N constructor in an expression, we coerce an expression from type Int to type NewtypeInt as far as we and the compiler are concerned, but absolutely nothing happens at runtime.

Similarly, when we match on the N constructor in a pattern, we coerce an expression from type NewtypeInt to Int, but again there's no overhead involved at runtime.

Summary: the three ways of naming types

Here's a brief recap of Haskell's three ways to introduce new names for types.

  • The data keyword introduces a truly new albegraic data type.

  • The type keyword gives us a synonym to use for an existing type. We can use the type and its synonym interchangeably.

  • The newtype keyword gives an existing type a distinct identity. The original type and the new type are not interchangeable.

JSON typeclasses without overlapping instances

Enabling GHC's support for overlapping instances might be a tempting and quick way to make our JSON code happy, but it won't teach us how to work with newtype. In any case, we will on occasion be faced with several equally good instances, in which case overlapping instances will not save us.

Our first task, then, is to wrap up the list type so that the compiler will not see it as a list.

newtype JAry a = JAry {
      fromJAry :: [a]
    } deriving (Eq, Ord, Show)

When we export this type from our module, we'll export the complete details of the type. Our module header will look like this:

module JSONClass
    (
      JAry(..)
    ) where

The “(..)” following the JAry name means “export all details of this type”.

Usually, when we export a newtype, we do not export the data constructor. Instead, we define a function that applies the constructor for us.

jary :: [a] -> JAry a
jary = JAry

We then export the type constructor, the deconstructor function, and the construction function, but not the data constructor.

module JSONClass
    (
      JAry(fromJAry)
    , jary
    ) where

However, the usual reason to avoid exporting the data constructor is to “hide the plumbing”, keeping our type abstract. When we don't export a type's data constructor, clients of our library can only use the functions we provide to construct and deconstruct values of that type. That gives us, the library authors, the liberty to change our internal representation if we need to. If we export the data constructor, and decide to change the details of our type, we'll run the risk of breaking any existing code that uses the constructor.

In this case, we really won't gain anything by making the array wrapper abstract, so it makes more sense if we simply export the entire definition.

We provide another wrapper type that hides our representation of a JSON object.

newtype JObj a = JObj {
      fromJObj :: [(String, a)]
    } deriving (Eq, Ord, Show)

With these types defined, we make small changes to the definition of our JValue type.

data JValue = JString String
            | JNumber Double
            | JBool Bool
            | JNull
            | JObject (JObj JValue)   -- was [(String, JValue)]
            | JArray (JAry JValue)    -- was [JValue]
              deriving (Eq, Ord, Show)

This change doesn't affect the instances of the JSON typeclass that we've already written, but we'd like to write instances for our new JAry and JObj types.

jaryFromJValue :: (JSON a) => JValue -> Either JSONError (JAry a)

jaryToJValue :: (JSON a) => JAry a -> JValue

instance (JSON a) => JSON (JAry a) where
    toJValue = jaryToJValue
    fromJValue = jaryFromJValue

Let's take a slow walk through the individual steps of converting a JAry a to a JValue. Given a list where we know that everything inside is a JSON instance, converting it to a list of JValues is easy.

listToJValues :: (JSON a) => [a] -> [JValue]
listToJValues = map toJValue

Taking this and wrapping it to become a JAry JValue is just a matter of applying the newtype's type constructor.

jvaluesToJAry :: [JValue] -> JAry JValue
jvaluesToJAry = JAry

(Remember, this has no performance cost. We're just telling the compiler to hide the fact that we're using a list.) To turn this into a JValue, we apply another type constructor.

jaryOfJValuesToJValue :: JAry JValue -> JValue
jaryOfJValuesToJValue = JArray

Assemble these pieces using function composition, and we get a concise one-liner for converting to a JValue.

jaryToJValue = JArray . JAry . map toJValue . fromJAry

We have more work to do to convert from a JValue to a JAry a, but we'll break it into reusable parts. The basic function is straightforward.

jaryFromJValue (JArray (JAry a)) =
    whenRight JAry (mapEithers fromJValue a)
jaryFromJValue _ = Left "not a JSON array"

The whenRight function inspects its argument: calls a function on it if it was created with the Right constructor, and leaves a Left value untouched.

whenRight :: (b -> c) -> Either a b -> Either a c
whenRight _ (Left err) = Left err
whenRight f (Right a) = Right (f a)

More complicated is mapEithers. It acts like the regular map function, but if it ever encounters a Left value, it returns that immediately, instead of continuing to accumulate a list of Right values.

mapEithers :: (a -> Either b c) -> [a] -> Either b [c]
mapEithers f (x:xs) = case mapEithers f xs of
                        Left err -> Left err
                        Right ys -> case f x of
                                      Left err -> Left err
                                      Right y -> Right (y:ys)
mapEithers _ _ = Right []

Because the elements of the list hidden in the JObj type have a little more structure, the code to convert to and from a JValue is a bit more complex. Fortunately, we can reuse the functions that we just defined.

import Control.Arrow (second)

instance (JSON a) => JSON (JObj a) where
    toJValue = JObject . JObj . map (second toJValue) . fromJObj

    fromJValue (JObject (JObj o)) = whenRight JObj (mapEithers unwrap o)
      where unwrap (k,v) = whenRight ((,) k) (fromJValue v)
    fromJValue _ = Left "not a JSON object"

Exercises

1.

Load the Control.Arrow module into ghci, and find out what the second function does.

2.

What is the type of (,)? When you use it in ghci, what does it do? What about (,,)?

The dreaded monomorphism restriction

The Haskell 98 standard has a subtle feature that can sometimes bite us in unexpected circumstances. Here's a simple function definition that illustrates the issue.

myShow = show

If we try to load this definition into ghci, it issues a peculiar complaint.

ghci> :load Monomorphism
[1 of 1] Compiling Main             ( Monomorphism.hs, interpreted )

Monomorphism.hs:2:9:
    Ambiguous type variable `a' in the constraint:
      `Show a' arising from a use of `show' at Monomorphism.hs:2:9-12
    Possible cause: the monomorphism restriction applied to the following:
      myShow :: a -> String (bound at Monomorphism.hs:2:0)
    Probable fix: give these definition(s) an explicit type signature
                  or use -fno-monomorphism-restriction
Failed, modules loaded: none.

The “monomorphism restriction” to which the error message refers is a part of the Haskell 98 standard. Monomorphism is simply the opposite of polymorphism: it indicates that an expression has exactly one type. The restriction lies in the fact that Haskell sometimes forces a declaration to be less polymorphic than we would expect.

We mention the monomorphism restriction here because although it isn't specifically related to typeclasses, they usually provide the circumstances in which it crops up.

[Tip]Tip

It's possible that you will not run into the monomorphism restriction in real code for a long time. We don't think you need to try to remember the details of this section. It should suffice to make a mental note of its existence, until eventually GHC complains at you with something like the above error message. If that occurs, simply remember that you read about the error here, and come back for guidance.

We won't attempt to explain the monomorphism restriction[10]. The consensus within the Haskell community is that it doesn't arise often; it is tricky to explain; it provides almost no practical benefit; and so it mostly serves to trip people up. For an example of its trickiness, while the definition above falls afoul of it, the following two compile without problems.

myShow2 value = show value

myShow3 :: (Show a) => a -> String
myShow3 = show

As these alternative definitions suggest, if GHC complains about the monomorphism restriction, we have three easy ways to address the error.

  • Make the function's arguments explicit, instead of leaving them implicit.

  • Give the definition an explicit type signature, instead of making the compiler infer its type.

  • Leave the code untouched, and compile the module with the NoMonomorphismRestriction language extension enabled. This disables the monomorphism restriction.

Because the monomorphism restriction is unwanted and unloved, it will almost certainly be dropped from the next revision of the Haskell standard. This does not quite mean that compiling with NoMonomorphismRestriction is always the right thing to do: some Haskell compilers (including older versions of GHC) do not understand this extension, but they'll accept either of the other approaches to making the error disappear. If this degree of portability isn't a concern to you, then by all means enable the language extension.



[10] If you simply must read the gory details, see section 4.5.5 of the Haskell 98 Report.

Want to stay up to date? Subscribe to the comment feed for this chapter, or the entire book.

Copyright 2007 Bryan O'Sullivan, Don Stewart, and John Goerzen. This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License. Icons by Paul Davey aka Mattahan.