Chapter 15. Data Structures

Table of Contents

Association Lists
Maps
Functions Are Data, Too
Extended Example: /etc/passwd
Extended example: Numeric Types
First Steps
Completed Code
Taking advantage of functions as data
Turning difference lists into a proper library
Lists, difference lists, and monoids

FIXME: Mutable storage with MVars needs to go in threading area

Back in FIXME: add ref to chapter 3ish, you saw how to use type to create handy aliases for types. That's a useful feature, but in this chapter we'll take it a step further. We'll show you how to create entirely new types. After doing that, we'll also show you some of the built-in tools that Haskell provides for arranging large amounts of data.

Before going on, there's a word that you might expect to see frequently in a chapter such as this: object. You're not going to see it at all in this chapter because Haskell isn't an object-oriented language. Concepts in this chapter may have similar-sounding names, but are likely quite different than the object-oriented ideas you may be familiar with already.

In this chapter, you'll learn how to create and use your own types. We'll start with very basic types, and progress all the way to defining your own numeric types. We'll have an example showing you how you can extend Haskell's numeric type system to support units of measure and symbolic manipulations. We think you'll find Haskell's typing system powerful and -- surprise -- even exciting.

FIXME: need to integrate this elsewhere

Named fields also make it easy to "modify" one or more components of a type. Here's an example:

ghci> (CustomColor2 100 0 5) {green = 200}

<interactive>:1:1: Not in scope: data constructor `CustomColor2'

<interactive>:1:24: Not in scope: `green'
ghci> (CustomColor2 100 0 5) {red = 50, green = 200}

<interactive>:1:1: Not in scope: data constructor `CustomColor2'

<interactive>:1:24: Not in scope: `red'

<interactive>:1:34: Not in scope: `green'

Of course, in Haskell, you don't actually modify an existing value. This syntax creates a copy of your original value, identical to the original in every way except the changes you specify.

Association Lists

Often times, we have to deal with data that is unordered but is indexed by a key. For instance, a Unix administrator might have a list of numeric UIDs and the textual usernames that they correspond to. The utility of this list is being able to look up a textual username for a given UID, not the order of the data. In otherwords, the UID is a key into a database.

In Haskell, there are several ways to handle data that is structured in this way. The two most common are association lists and the Data.Map module. Association lists are handy because they are simple. They are standard Haskell lists, so all the functions that work on those lists work on association lists. However, for large data sets, Data.Map will have a considerable performance advantage over association lists. We'll consider both in this chapter.

An association list is just a normal list containing (key, value) tuples. The type of a list of mappings from UID to username might be [(Integer, String)]. You could use just about any type for both the key and the value.

You can build association lists just like you would build any other list. Haskell comes with one built-in function called Data.List.lookup to look up data in an association list. Its type is Eq a => a -> [(a, b)] -> Maybe b. Can you guess how it works from that type? Let's take a look in ghci.

ghci> let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]
ghci> lookup 1 al
Just "one"
ghci> lookup 5 al
Nothing

The lookup function is really simple. Here's one way you could write it:

myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((thiskey,thisval):rest) =
    if key == thiskey
       then Just thisval
       else myLookup key rest

This function returns Nothing if passed the empty list. Otherwise, it compares the key with the key we're looking for. If a match is found, the corresponding value is returned. Otherwise, it searches the rest of the list.

Let's take a look at a more complex example of association lists. On Unix/Linux machines, there is a file called /etc/passwd that stores usernames, UIDs, home directories, and various other information. Let's write a program that parses such a file, creates an association list, and lets the user look up a username by giving a UID.

-- ch14/passwd-al.hs

import Data.List
import System.IO
import Control.Monad(when)
import System.Exit
import System.Environment(getArgs)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right amount of args, give an error and abort
    when (length args /= 2) $ do
        putStrLn "Syntax: passwd-al filename uid"
        exitFailure

    -- Read the file lazily
    content <- readFile (args !! 0)

    -- Compute the username in pure code
    let username = findByUID content (read (args !! 1))

    -- Display the result
    case username of 
         Just x -> putStrLn x
         Nothing -> putStrLn "Could not find that UID"

-- Given the entire input and a UID, see if we can find a username.
findByUID :: String -> Integer -> Maybe String
findByUID content uid =
    let al = map parseline . lines $ content
        in lookup uid al

-- Convert a colon-separated line into fields
parseline :: String -> (Integer, String)
parseline input =
    let fields = split ':' input
        in (read (fields !! 2), fields !! 0)

{- | Takes a delimiter and a list.  Break up the last based on the
-  delimeter. -}
split :: Eq a => a -> [a] -> [[a]]

-- If the input is empty, the result is a list of empty lists.
split _ [] = [[]]
split delim str =
    let -- Find the part of the list before delim and put it in "before".
        -- The rest of the list, including the leading delim, goes
        -- in "remainder".
        (before, remainder) = span (/= delim) str
        in
        before : case remainder of
                      [] -> []
                      x -> -- If there is more data to process,
                           -- call split recursively to process it
                           split delim (tail x)

Let's look at this program. The heart of it is findByUID, which is a simple function that parses the input one line at a time, then calls lookup over the result. The remaining program is concerned with parsing the input. The input file looks like this:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
jgoerzen:x:1000:1000:John Goerzen,,,:/home/jgoerzen:/bin/bash
    

Maps

The Data.Map module is used for working with maps. This module has some functions with same names as those in Prelude or other common modules. Therefore, when using it, most people import it using import qualified Data.Map as Map and use Map.function to refer to functions in that module. Let's start our look at Data.Map by taking a look at some ways to build a map.

-- ch14/buildmap.hs

import qualified Data.Map as Map

-- Functions to generate a Map that represents an association list
-- as a map

al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]

{- | Create a map representation of 'al' by converting the association
-  list using Map.fromList -}
mapFromAL =
    Map.fromList al

{- | Create a map represetation of 'al' by doing a fold -}
mapFold =
    foldl (\map (k, v) -> Map.insert k v map) Map.empty al

{- | Manually create a map with the elements of 'al' in it -}
mapManual =
    Map.insert 2 "two" . 
    Map.insert 4 "four" .
    Map.insert 1 "one" .
    Map.insert 3 "three" $ Map.empty

Functions like Map.insert work in the usual Haskell way: they return a copy of the input data, with the requested change applied. This is quite handy with maps. It means that you can use foldl to build up a map as in the mapFold example. Or, you can chain together calls to Map.insert as in the mapManual example. Let's use ghci to verify that all of these maps are as expected:

ghci> :l buildmap.hs
[1 of 1] Compiling Main             ( buildmap.hs, interpreted )
Ok, modules loaded: Main.
ghci> al
Loading package array-0.1.0.0 ... linking ... done.
Loading package containers-0.1.0.1 ... linking ... done.
[(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> mapFromAL
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> mapFold
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> mapManual
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]

Notice that the output from mapManual doesn't occur in the order it was passed in. Maps do not guarantee that they will preserve the original ordering.

Maps operate similar in concept to association lists. The Data.Map module provides functions for adding and removing data from maps. It also provides functions for converting maps back and forth to association lists, filtering them, modifying them, and folding them. The library documentation for this module is good, so instead of going into detail on each function, we're going to present an example that ties together much of the concepts we've discussed in this chapter.

Functions Are Data, Too

FIXME: reviewers wanted better examples, perhaps based on CustomColor instead of something returning a tuple. Think of one.

Back in the beginning of the chapter, we reminded you that Haskell isn't object-oriented. Part of Haskell's power is the ease with which you can create and manipulate functions with it. Let's take a look at a record that stores a function as one of its fields:

-- ch14/funcrecs.hs

{- | Our usual CustomColor type to play with -}
data CustomColor =
  CustomColor {red :: Int,
               green :: Int,
               blue :: Int}
  deriving (Eq, Show, Read)

{- | A new type that stores a name and a function.

The function takes an Int, applies some computation to it, and returns
the Int along with a CustomColor -}
data FuncRec =
    FuncRec {name :: String,
             colorCalc :: Int -> (CustomColor, Int)}

plus5func color x = (color, x + 5)

purple = CustomColor 255 0 255

plus5 = FuncRec {name = "plus5", colorCalc = plus5func purple}
always0 = FuncRec {name = "always0", colorCalc = \_ -> (purple, 0)}

Notice the type of the colorCalc field: it's a function. It takes an Int and returns a tuple of (CustomColor, Int). We create two FuncRec records: plus5 and always0. Notice that the colorCalc for both of them will always return the color purple. FuncRec itself has no field to store the color in, yet that value somehow becomes part of the function itself. This is called a closure. Let's play with this a bit:

ghci> :l funcrecs.hs
[1 of 1] Compiling Main             ( funcrecs.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t plus5
plus5 :: FuncRec
ghci> name plus5
"plus5"
ghci> :t colorCalc plus5
colorCalc plus5 :: Int -> (CustomColor, Int)
ghci> (colorCalc plus5) 7
(CustomColor {red = 255, green = 0, blue = 255},12)
ghci> :t colorCalc always0
colorCalc always0 :: Int -> (CustomColor, Int)
ghci> (colorCalc always0) 7
(CustomColor {red = 255, green = 0, blue = 255},0)

That worked well enough, but you might be wondering how to do something more advanced such as making a piece of data available multiple places. A type construction function can be helpful. Here's an example:

-- ch14/funcrecs2.hs 

data FuncRec =
    FuncRec {name :: String,
             calc :: Int -> Int,
             namedCalc :: Int -> (String, Int)}

mkFuncRec :: String -> (Int -> Int) -> FuncRec
mkFuncRec name calcfunc =
    FuncRec {name = name,
             calc = calcfunc,
             namedCalc = \x -> (name, calcfunc x)}

plus5 = mkFuncRec "plus5" (+ 5)
always0 = mkFuncRec "always0" (\_ -> 0)

Here we have a function called mkFuncRec that takes a String and another function as parameters, and returns a new FuncRec record. Notice how both parameters to mkFuncRec are used multiple places. Let's try it out:

ghci> :l funcrecs2.hs
[1 of 1] Compiling Main             ( funcrecs2.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t plus5
plus5 :: FuncRec
ghci> name plus5
"plus5"
ghci> (calc plus5) 5
10
ghci> (namedCalc plus5) 5
("plus5",10)
ghci> let plus5a = plus5 {name = "PLUS5A"}
ghci> name plus5a
"PLUS5A"
ghci> (namedCalc plus5a) 5
("plus5",10)

Notice the creation of plus5a. We changed the name field, but not the namedCalc field. That's why name has the new name, but namedCalc still returns the name that was passed to mkFuncRec; it doesn't change unless we explicitly change it.

Extended Example: /etc/passwd

In order to illustrate the usage of a number of different data structures together, we've prepared an extended example. This example parses and stores entries from files in the format of a typical /etc/passwd file.

-- ch14/passwdmap.hs

import Data.List
import qualified Data.Map as Map
import System.IO
import Text.Printf(printf)
import System.Environment(getArgs)
import System.Exit
import Control.Monad(when)

{- | The primary piece of data this program will store.
   It represents the fields in a POSIX /etc/passwd file -}
data PasswdEntry = PasswdEntry {
    userName :: String,
    password :: String,
    uid :: Integer,
    gid :: Integer,
    gecos :: String,
    homeDir :: String,
    shell :: String}
    deriving (Eq, Ord)

{- | Define how we get data to a 'PasswdEntry'. -}
instance Show PasswdEntry where
    show pe = printf "%s:%s:%d:%d:%s:%s:%s" 
                (userName pe) (password pe) (uid pe) (gid pe)
                (gecos pe) (homeDir pe) (shell pe)

{- | Converting data back out of a 'PasswdEntry'. -}
instance Read PasswdEntry where
    readsPrec _ value =
        case split ':' value of
             [f1, f2, f3, f4, f5, f6, f7] ->
                 -- Generate a 'PasswdEntry' the shorthand way:
                 -- using the positional fields.  We use 'read' to convert
                 -- the numeric fields to Integers.
                 [(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]
             x -> error $ "Invalid number of fields in input: " ++ show x
        where 
        {- | Takes a delimiter and a list.  Break up the last based on the
        -  delimeter. -}
        split :: Eq a => a -> [a] -> [[a]]

        -- If the input is empty, the result is a list of empty lists.
        split _ [] = [[]]
        split delim str =
            let -- Find the part of the list before delim and put it in
                -- "before".  The rest of the list, including the leading 
                -- delim, goes in "remainder".
                (before, remainder) = span (/= delim) str
                in
                before : case remainder of
                              [] -> []
                              x -> -- If there is more data to process,
                                   -- call split recursively to process it
                                   split delim (tail x)

-- Convenience aliases; we'll have two maps: one from UID to entries
-- and the other from username to entries
type UIDMap = Map.Map Integer PasswdEntry
type UserMap = Map.Map String PasswdEntry

{- | Converts input data to maps.  Returns UID and User maps. -}
inputToMaps :: String -> (UIDMap, UserMap)
inputToMaps inp =
    (uidmap, usermap)
    where
    -- fromList converts a [(key, value)] list into a Map
    uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries
    usermap = Map.fromList . 
              map (\pe -> (userName pe, pe)) $ entries
    -- Convert the input String to [PasswdEntry]
    entries = map read (lines inp)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right number of args,
    -- give an error and abort

    when (length args /= 1) $ do
        putStrLn "Syntax: passwdmap filename"
        exitFailure

    -- Read the file lazily
    content <- readFile (head args)
    let maps = inputToMaps content
    mainMenu maps

mainMenu maps@(uidmap, usermap) = do
    putStr optionText
    sel <- getLine
    -- See what they want to do.  For every option except 4,
    -- return them to the main menu afterwards by calling
    -- mainMenu recursively
    case sel of
         "1" -> lookupUserName >> mainMenu maps
         "2" -> lookupUID >> mainMenu maps
         "3" -> displayFile >> mainMenu maps
         "4" -> return ()
         _ -> putStrLn "Invalid selection" >> mainMenu maps

    where 
    lookupUserName = do
        putStrLn "Username: "
        username <- getLine
        case Map.lookup username usermap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    lookupUID = do
        putStrLn "UID: "
        uidstring <- getLine
        case Map.lookup (read uidstring) uidmap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    displayFile = 
        putStr . unlines . map (show . snd) . Map.toList $ uidmap
    optionText = 
          "\npasswdmap options:\n\
           \\n\
           \1   Look up a user name\n\
           \2   Look up a UID\n\
           \3   Display entire file\n\
           \4   Quit\n\n\
           \Your selection: "

This example maintains two maps: one from username to PasswdEntry and another one from UID to PasswdEntry. Database developers may find it convenient to think of this has having two different indices into the data to speed searching on different fields.

Take a look at the Show and Read instances for PasswdEntry. There is already a standard format for rendering data of this type as a string: the colon-separated version already used by the system. So our Show function displays a PasswdEntry in the format, and Read parses that format.

Extended example: Numeric Types

We've told you how powerful and expressive Haskell's type system is. We've shown you a lot of ways to use that power. Here's a chance to really see that in action.

Back in the section called “Numeric Types”, we showed the numeric typeclasses that come with Haskell. Let's see what we can do by defining new types and utilizing the numeric typeclasses to integrate them with basic mathematics in Haskell.

Let's start by thinking through what we'd like to see out of ghci when we interact with our new types. To start with, it might be nice to render numeric expressions as strings, making sure to indicate proper precedence. Perhaps we could create a function called prettyShow to do that.

ghci> :l num.hs
[1 of 1] Compiling Main             ( num.hs, interpreted )
Ok, modules loaded: Main.
ghci> 5 + 1 * 3
8
ghci> prettyShow $ 5 + 1 * 3
"5+(1*3)"
ghci> prettyShow $ 5 * 1 + 3
"(5*1)+3"

That looks nice, but it wasn't all that smart. We could easily simplify out the 1 * part of the expression. How about a function to do some very basic simplification?

ghci> prettyShow $ simplify $ 5 + 1 * 3
"5+3"

How about converting a numeric expression to Reverse Polish Notation (RPN)? RPN is a postfix notation that never requires parentheses, and is commonly found on HP calculators. RPN is a stack-based notation. You enter numbers onto the stack, and when you enter operations, they pop the most recent numbers off the stack and place the result on the stack.

ghci> rpnShow $ 5 + 1 * 3
"5 1 3 * +"
ghci> rpnShow $ simplify $ 5 + 1 * 3
"5 3 +"

Maybe it would be nice to be able to represent simple expressions with symbols for the unknowns.

ghci> prettyShow $ 5 + (Symbol "x") * 3
"5+(x*3)"

It's often important to track units of measure when working with numbers. For instance, when you see the number 5, does it mean 5 meters, 5 feet, or 5 bytes? Of course, if you divide 5 meters by 2 seconds, the system ought to be able to figure out the appropriate units. Moreover, it should stop you from adding 2 seconds to 5 meters.

ghci> 5 / 2
2.5
ghci> (units 5 "m") / (units 2 "s")
2.5_m/s
ghci> (units 5 "m") + (units 2 "s")
*** Exception: Mis-matched units in add
ghci> (units 5 "m") + (units 2 "m")
7_m
ghci> (units 5 "m") / 2
2.5_m
ghci> 10 * (units 5 "m") / (units 2 "s")
25.0_m/s

If we define an expression or a function that is valid for all numbers, we should be able to calculate the result, or render the expression. For instance, if we define test to have type Num a => a, and say test = 2 * 5 + 3, then we ought to be able to do this:

ghci> test
13
ghci> rpnShow test
"2 5 * 3 +"
ghci> prettyShow test
"(2*5)+3"
ghci> test + 5
18
ghci> prettyShow (test + 5)
"((2*5)+3)+5"
ghci> rpnShow (test + 5)
"2 5 * 3 + 5 +"

Since we have units, we should be able to handle some basic trigonometry as well. Many of these operations operate on angles. Let's make sure that we can handle both degrees and radians.

ghci> sin (pi / 2)
1.0
ghci> sin (units (pi / 2) "rad")
1.0_1.0
ghci> sin (units 90 "deg")
1.0_1.0
ghci> (units 50 "m") * sin (units 90 "deg")
50.0_m

Finally, we ought to be able to put all this together and combine different kinds of expressions together.

ghci> ((units 50 "m") * sin (units 90 "deg")) :: Units (SymbolicManip Double)
50.0*sin(((2.0*pi)*90.0)/360.0)_m
ghci> prettyShow $ dropUnits $ (units 50 "m") * sin (units 90 "deg")
"50.0*sin(((2.0*pi)*90.0)/360.0)"
ghci> rpnShow $ dropUnits $ (units 50 "m") * sin (units 90 "deg")
"50.0 2.0 pi * 90.0 * 360.0 / sin *"
ghci> (units (Symbol "x") "m") * sin (units 90 "deg")
x*sin(((2.0*pi)*90.0)/360.0)_m

Perhaps a future excercise could enhance prettyShow to remove unnecessary parentheses as well.

Everything you've just seen is possible with Haskell types and classes. In fact, you've been reading a real ghci session demonstrating num.hs, which you'll see shortly.

First Steps

Let's think about how we would accomplish everything shown above. To start with, we might use ghci to check the type of (+), which is Num a => a -> a -> a. If we want to make possible some custom behavior for the plus operator, then we will have to define a new type and make it an instance of Num. This type will need to store an expression symbolically. We can start by thinking of operations such as addition. To store that, we will need to store the operation itself, its left side, and its right side. The left and right sides could themselves be expressions.

We can therefore think of an expression as a sort of tree. Let's start with some simple types.

-- ch14/numsimple.hs

-- The "operators" that we're going to support
data Op = Plus | Minus | Mul | Div | Pow
        deriving (Eq, Show)

{- The core symbolic manipulation type -}
data Num a => SymbolicManip a = 
          Number a           -- Simple number, such as 5
        | Arith Op (SymbolicManip a) (SymbolicManip a)
          deriving (Eq, Show)

{- SymbolicManip will be an instance of Num.  Define how the Num
operations are handled over a SymbolicManip.  This will implement things
like (+) for SymbolicManip. -}
instance Num a => Num (SymbolicManip a) where
    a + b = Arith Plus a b
    a - b = Arith Minus a b
    a * b = Arith Mul a b
    negate a = Arith Mul (Number (-1)) a
    abs a = error "abs is unimplemented"
    signum _ = error "signum is unimplemented"
    fromInteger i = Number (fromInteger i)

First, we define a type called Op. This type simply represents some of the operations we will intend to support. Next, there is a definition for SymbolicManip a. Because of the Num a constraint, any Num can be used for the a. So a full type may be something like SymbolicManip Int.

A SymbolicManip type can be a plain number, or it can be some arithmetic operation. The type for the Arith constructor is recursive, which is perfectly legal in Haskell. Arith creates a SymbolicManip out of an Op and two other SymbolicManip items. Let's look at an example:

ghci> :l numsimple.hs
[1 of 1] Compiling Main             ( numsimple.hs, interpreted )
Ok, modules loaded: Main.
ghci> Number 5
Number 5
ghci> :t Number 5
Number 5 :: (Num t) => SymbolicManip t
ghci> :t Number (5::Int)
Number (5::Int) :: SymbolicManip Int
ghci> Number 5 * Number 10
Arith Mul (Number 5) (Number 10)
ghci> (5 * 10)::SymbolicManip Int
Arith Mul (Number 5) (Number 10)
ghci> (5 * 10 + 2)::SymbolicManip Int
Arith Plus (Arith Mul (Number 5) (Number 10)) (Number 2)

You can see that we already have a very basic representation of expressions working. Notice how Haskell "converted" 5 * 10 + 2 into a SymbolicManip, and even handled order of evaluation properly. This wasn't really a true conversion; SymbolicManip is a first-class number now. Integer numeric literals are internally treated as being wrapped in fromInteger anyway, so 5 is just as valid as a SymbolicManip Int as it as an Int.

From here, then, our task is simple: extend the SymbolicManip type to be able to represent all the operations we will want to perform, implement instances of it for the other numeric typeclasses, and implement our own instance of Show for SymbolicManip that renders this tree in a more accessible fashion.

Completed Code

Here is the completed num.hs, which was used with the ghci examples at the beginning of this chapter.

-- ch14/num.hs

import Data.List

--------------------------------------------------
-- Symbolic/units manipulation
--------------------------------------------------

-- The "operators" that we're going to support
data Op = Plus | Minus | Mul | Div | Pow
        deriving (Eq, Show)

{- The core symbolic manipulation type.  It can be a simple number,
a symbol, a binary arithmetic operation (such as +), or a unary
arithmetic operation (such as cos)

Notice the types of BinaryArith and UnaryArith: it's a recursive
type.  So, we could represent a (+) over two SymbolicManips. -}
data Num a => SymbolicManip a = 
          Number a           -- Simple number, such as 5
        | Symbol String      -- A symbol, such as x
        | BinaryArith Op (SymbolicManip a) (SymbolicManip a)
        | UnaryArith String (SymbolicManip a)
          deriving (Eq)

{- SymbolicManip will be an instance of Num.  Define how the Num
operations are handled over a SymbolicManip.  This will implement things
like (+) for SymbolicManip. -}
instance Num a => Num (SymbolicManip a) where
    a + b = BinaryArith Plus a b
    a - b = BinaryArith Minus a b
    a * b = BinaryArith Mul a b
    negate a = BinaryArith Mul (Number (-1)) a
    abs a = UnaryArith "abs" a
    signum _ = error "signum is unimplemented"
    fromInteger i = Number (fromInteger i)

{- Make SymbolicManip an instance of Fractional -}
instance (Fractional a) => Fractional (SymbolicManip a) where
    a / b = BinaryArith Div a b
    recip a = BinaryArith Div (Number 1) a
    fromRational r = Number (fromRational r)

{- Make SymbolicManip an instance of Floating -}
instance (Floating a) => Floating (SymbolicManip a) where
    pi = Symbol "pi"
    exp a = UnaryArith "exp" a
    log a = UnaryArith "log" a
    sqrt a = UnaryArith "sqrt" a
    a ** b = BinaryArith Pow a b
    sin a = UnaryArith "sin" a
    cos a = UnaryArith "cos" a
    tan a = UnaryArith "tan" a
    asin a = UnaryArith "asin" a
    acos a = UnaryArith "acos" a
    atan a = UnaryArith "atan" a
    sinh a = UnaryArith "sinh" a
    cosh a = UnaryArith "cosh" a
    tanh a = UnaryArith "tanh" a
    asinh a = UnaryArith "asinh" a
    acosh a = UnaryArith "acosh" a
    atanh a = UnaryArith "atanh" a

{- Show a SymbolicManip as a String, using conventional
algebriac notation -}
prettyShow :: (Show a, Num a) => SymbolicManip a -> String

-- Show a number or symbol as a bare number or serial
prettyShow (Number x) = show x
prettyShow (Symbol x) = x

prettyShow (BinaryArith op a b) =
    let pa = simpleParen a
        pb = simpleParen b
        pop = op2str op
        in pa ++ pop ++ pb
prettyShow (UnaryArith op a) = 
    op ++ "(" ++ show a ++ ")"

op2str :: Op -> String
op2str Plus = "+"
op2str Minus = "-"
op2str Mul = "*"
op2str Div = "/"
op2str Pow = "**"

{- Add parenthesis where needed.  This function is fairly conservative
and will add parenthesis when not needed in some cases.
    
Haskell will have already figured out precedence for us while building
up the SymbolicManip. -}
simpleParen :: (Show a, Num a) => SymbolicManip a -> String
simpleParen (Number x) = prettyShow (Number x)
simpleParen (Symbol x) = prettyShow (Symbol x)
simpleParen x@(BinaryArith _ _ _) = "(" ++ prettyShow x ++ ")"
simpleParen x@(UnaryArith _ _) = prettyShow x

{- Showing a SymbolicManip calls the prettyShow function on it -}
instance (Show a, Num a) => Show (SymbolicManip a) where
    show a = prettyShow a
    
{- Show a SymbolicManip using RPN.  HP calculator users may
find this familiar. -}
rpnShow :: (Show a, Num a) => SymbolicManip a -> String
rpnShow i =
    let toList (Number x) = [show x]
        toList (Symbol x) = [x]
        toList (BinaryArith op a b) = toList a ++ toList b ++
           [op2str op]
        toList (UnaryArith op a) = toList a ++ [op]
        join :: [a] -> [[a]] -> [a]
        join delim l = concat (intersperse delim l)
    in join " " (toList i)

{- Perform some basic algebraic simplifications on a SymbolicManip. -}
simplify :: (Num a) => SymbolicManip a -> SymbolicManip a
simplify (BinaryArith op ia ib) = 
    let sa = simplify ia
        sb = simplify ib
        in
        case (op, sa, sb) of 
                (Mul, Number 1, b) -> b
                (Mul, a, Number 1) -> a
                (Mul, Number 0, b) -> Number 0
                (Mul, a, Number 0) -> Number 0
                (Div, a, Number 1) -> a
                (Plus, a, Number 0) -> a
                (Plus, Number 0, b) -> b
                (Minus, a, Number 0) -> a
                _ -> BinaryArith op sa sb
simplify (UnaryArith op a) = UnaryArith op (simplify a)
simplify x = x
--------------------------------------------------
-- Units of measure support
--------------------------------------------------

{- New data type: Units.  A Units type contains a number
and a SymbolicManip, which represents the units of measure.
A simple label would be something like (Symbol "m") -}
data Num a => Units a = Units a (SymbolicManip a)
           deriving (Eq)

{- Implement Units for Num.  We don't know how to convert between
arbitrary units, so we generate an error if we try to add numbers with
different units.  For multiplication, generate the appropriate
new units. -}
instance (Num a) => Num (Units a) where
    (Units xa ua) + (Units xb ub) 
        | ua == ub = Units (xa + xb) ua
        | otherwise = error "Mis-matched units in add"
    (Units xa ua) - (Units xb ub) = (Units xa ua) + (Units (xb * (-1)) ub)
    (Units xa ua) * (Units xb ub) = Units (xa * xb) (ua * ub)
    negate (Units xa ua) = Units (negate xa) ua
    abs (Units xa ua) = Units (abs xa) ua
    signum (Units xa _) = Units (signum xa) (Number 1)
    fromInteger i = Units (fromInteger i) (Number 1)

{- Make Units an instance of Fractional -}
instance (Fractional a) => Fractional (Units a) where
    (Units xa ua) / (Units xb ub) = Units (xa / xb) (ua / ub)
    recip a = 1 / a
    fromRational r = Units (fromRational r) (Number 1)

{- Floating implementation for Units.

Use some intelligence for angle calculations: support deg and rad
-}
instance (Floating a) => Floating (Units a) where
    pi = (Units pi (Number 1))
    exp _ = error "exp not yet implemented in Units"
    log _ = error "log not yet implemented in Units"
    (Units xa ua) ** (Units xb ub) 
        | ub == Number 1 = Units (xa ** xb) (ua ** Number xb)
        | otherwise = error "units for RHS of ** not supported"
    sqrt (Units xa ua) = Units (sqrt xa) (sqrt ua)
    sin (Units xa ua) 
        | ua == Symbol "rad" = Units (sin xa) (Number 1)
        | ua == Symbol "deg" = Units (sin (deg2rad xa)) (Number 1)
        | otherwise = error "Units for sin must be deg or rad"
    cos (Units xa ua) 
        | ua == Symbol "rad" = Units (cos xa) (Number 1)
        | ua == Symbol "deg" = Units (cos (deg2rad xa)) (Number 1)
        | otherwise = error "Units for cos must be deg or rad"
    tan (Units xa ua)
        | ua == Symbol "rad" = Units (tan xa) (Number 1)
        | ua == Symbol "deg" = Units (tan (deg2rad xa)) (Number 1)
        | otherwise = error "Units for tan must be deg or rad"
    asin (Units xa ua) 
        | ua == Number 1 = Units (rad2deg $ asin xa) (Symbol "deg")
        | otherwise = error "Units for asin must be empty"
    acos (Units xa ua)
        | ua == Number 1 = Units (rad2deg $ acos xa) (Symbol "deg")
        | otherwise = error "Units for acos must be empty"
    atan (Units xa ua)
        | ua == Number 1 = Units (rad2deg $ atan xa) (Symbol "deg")
        | otherwise = error "Units for atan must be empty"
    sinh = error "sinh not yet implemented in Units"
    cosh = error "cosh not yet implemented in Units"
    tanh = error "tanh not yet implemented in Units"
    asinh = error "asinh not yet implemented in Units"
    acosh = error "acosh not yet implemented in Units"
    atanh = error "atanh not yet implemented in Units"

{- A simple function that takes a number and a String and returns an
appropriate Units type to represent the number and its unit of measure -}
units :: (Num z) => z -> String -> Units z
units a b = Units a (Symbol b)

{- Extract the number only out of a Units type -}
dropUnits :: (Num z) => Units z -> z
dropUnits (Units x _) = x
                                                    
{- Showing units: we show the numeric component, an underscore,
then the prettyShow version of the simplified units -}
instance (Show a, Num a) => Show (Units a) where
    show (Units xa ua) = show xa ++ "_" ++ prettyShow (simplify ua)

{- Utilities for the Unit implementation -}
deg2rad x = 2 * pi * x / 360
rad2deg x = 360 * x / (2 * pi)

--------------------------------------------------
-- Things to play with
--------------------------------------------------

test :: (Num a) => a
test = 2 * 5 + 3

Here we have done what we set out to accomplish: implemented more instances for SymbolicManip. We have also introduced another type called Units which stores a number and a unit of measure. We implement several show-like functions which render the SymbolicManip or Units in different ways.

There is one other point that this example drives home. In many languages, even those with objects and overloading, there are still some parts of the language that are special in some way. In Haskell, the "special" bits are extremely small. We have just developed a new representation for something as fundamental as a number, and it has been really quite easy. Haskell takes code reuse and interchangability to the extreme. It is easy to make code generic and work on things of many different types. It's also easy to make up new types and make them automatically be first-class features of the system.

Taking advantage of functions as data

In an imperative language, appending two lists is cheap and easy. Here's a simple C implementation in which we maintain a pointer to the head and tail of a list.

struct node {
    struct node *next;
};

struct list {
    struct node *head, *tail;
};
    
void list_append(struct list *list, struct list *new_tail)
{
    list->tail->next = new_tail->head;
    list->tail = new_tail->tail;

    /* new_tail is no longer valid */

    new_tail->head = NULL;
    new_tail->tail = NULL;
}

When we want to append another list onto the end, we modify the last node of the existing list to point to its head node, then update its tail pointer to point to its tail node.

Obviously, this approach is off limits to us in Haskell if we want to stay pure. Since pure data is immutable, we can't go around modifying lists in place. Haskell's (++) operator appends two lists by creating a new one.

(++) :: [a] -> [a] -> [a]
(x:xs) ++ ys = x : xs ++ ys
_ ++ ys = ys

From inspecting the code, we can see that the cost of creating a new list depends on the length of the initial list.

We often need to append lists over and over, to construct one big list. For instance, we might be generating the contents of a web page as a String, emitting a chunk at a time as we traverse some data structure. Each time we have a chunk of markup to add to the page, we will naturally want to append it onto the end of our existing String.

If a single append has a cost proportional to the length of the initial list, and each repeated append makes the initial list longer, we end up in an unhappy situation: the cost of all of the repeated appends is proportional to the square of the length of the final list.

To understand this, let's dig in a little. The (++) operator is right associative.

ghci> :info (++)
(++) :: [a] -> [a] -> [a] 	-- Defined in GHC.Base
infixr 5 ++

This means that a Haskell implementation will evaluate the expression "a" ++ "b" ++ "c" as if we had put parentheses around it as follows: "a" ++ ("b" ++ "c"). This makes good performance sense, because it keeps the left operand as short as possible.

When we repeatedly append onto the end of a list, we defeat this associativity. Let's say we start with the list "a" and append "b", and save the result as our new list. If we later append "c" onto this new list, our left operand is now "ab". In this scheme, every time we append, our left operand gets longer.

Meanwhile, the imperative programmers are cackling with glee, because the cost of their repeated appends only depends on the number of them that they perform. They have linear performance; ours is quadratic.

When something as common as repeated appending of lists imposes such a performance penalty, it's time to look at the problem from another angle.

The expression ("a"++) is a section, a partially applied function. What is its type?

ghci> :type ("a" ++)
("a" ++) :: [Char] -> [Char]

Since this is a function, we can use the (.) operator to compose it with another section, let's say ("b"++).

ghci> :type ("a" ++) . ("b" ++)
("a" ++) . ("b" ++) :: [Char] -> [Char]

Our new function has the same type. What happens if we stop composing functions, and instead provide a String to the function we've created?

ghci> let f = ("a" ++) . ("b" ++)
ghci> f []
"ab"

We've appended the strings! We're using these partially applied functions to store data, which we can retrieve by providing an empty list. Each partial application of (++) and (.) represents an append, but it doesn't actually perform the append.

There are two very interesting things about this approach. The first is that the cost of a partial application is constant, so the cost of many partial applications is linear. The second is that when we finally provide a [] value to unlock the final list from its chain of partial applications, application proceeds from right to left. This keeps the left operand of (++) small, and so the overall cost of all of these appends is linear, not quadratic.

By choosing an unfamiliar data representation, we've avoided a nasty performance quagmire, while gaining a new perspective on the usefulness of treating functions as data. By the way, this is an old trick, and it's usually called a difference list.

We're not yet finished, though. As appealing as difference lists are in theory, ours won't be very pleasant in practice if we leave all the plumbing of (++), (.), and partial application exposed. We need to turn this mess into something pleasant to work with.

Turning difference lists into a proper library

Our first step is to use a newtype declaration to hide the underlying type from our users. We'll create a new type, and call it DList. Like a regular list, it will be a parameterised type.

newtype DList a = DL {
      unDL :: [a] -> [a]
    }

The unDL function is our deconstructor, that removes the DL constructor. When we go back and decide what we want to export from our module, we will omit our data constructor and deconstruction function, so the DList type will be completely opaque to our users. They'll only be able to work with the type using the other functions we export.

append :: DList a -> DList a -> DList a
append xs ys = DL (unDL xs . unDL ys)

Our append function may seem a little complicated, but it's just doing some simple book-keeping around the use of the (.) operator that we demonstrated earlier. To compose our functions, we must first unwrap them from their DL constructor, hence the uses of unDL. We then re-wrap the resulting function with the DL constructor so that it will have the right type.

Here's another way of writing the same function, in which we perform the unwrapping of xs and ys via pattern matching.

append' :: DList a -> DList a -> DList a
append' (DL xs) (DL ys) = DL (xs . ys)

Our DList type won't be much use if we can't convert back and forth between the DList representation and a regular list.

fromList :: [a] -> DList a
fromList xs = DL (xs ++)

toList :: DList a -> [a]
toList (DL xs) = xs []

Once again, compared to the original versions of these functions that we wrote, all we're doing is a little book-keeping to hide the plumbing.

If we want to make DList useful as a substitute for regular lists, we need to provide some more of the common list operations.

empty :: DList a
empty = DL id

-- equivalent of the list type's (:) operator
cons :: a -> DList a -> DList a
cons x (DL xs) = DL ((x:) . xs)
infixr `cons`

dfoldr :: (a -> b -> b) -> b -> DList a -> b
dfoldr f z xs = foldr f z (toList xs)

Although the DList approach makes appends cheap, not all list-like operations are easily available. The head function has constant cost for lists. Our equivalent requires that we convert the entire DList to a regular list, so it is much more expensive than its list counterpart: its cost is linear.

safeHead :: DList a -> Maybe a
safeHead xs = case toList xs of
                (y:_) -> Just y
                _ -> Nothing

To support an equivalent of map, we can make our DList type a functor.

dmap :: (a -> b) -> DList a -> DList b
dmap f = dfoldr go empty
    where go x xs = cons (f x) xs

instance Functor DList where
    fmap = dmap

Once we decide that we have written enough equivalents of list functions, we go back to the top of our source file, and add a module header.

module DList
    (
      DList
    , fromList
    , toList
    , empty
    , append
    , cons
    , dfoldr
    ) where

Lists, difference lists, and monoids

In abstract algebra, there exists a simple abstract structure called a monoid. Many mathematical objects are monoids, because the bar to entry is very low. In order to be considered a monoid, an object must have two properties.

  • An associative binary operator. Let's call it (*): the expression a * (b * c) must give the same result as (a * b) * c.

  • An identity value. If we call this e, it must obey two rules: a * e == a and e * a == a.

The rules for monoids don't say what the binary operator must do, merely that such an operator must exist. Because of this, lots of mathematical objects are monoids. If we take addition as the binary operator and zero as the identity value, integers form a monoid. With multiplication as the binary operator and one as the identity value, integers form a different monoid.

Monoids are ubiquitous in Haskell. The Monoid type class is defined in the Data.Monoidmodule.

class Monoid a where
    mempty  :: a                -- the identity
    mappend :: a -> a -> a      -- associative binary operator

If we take (++) as the binary operator and [] as the identity, lists form a monoid.

instance Monoid [a] where
    mempty  = []
    mappend = (++)

Since lists and DLists are so closely related, it follows that our DList type must be a monoid, too.

instance Monoid (DList a) where
    mempty = empty
    mappend = append

Let's try our the methods of the Monoid type class in ghci.

ghci> "foo" `mappend` "bar"
"foobar"
ghci> toList (fromList [1,2] `mappend` fromList [3,4])
[1,2,3,4]
ghci> mempty `mappend` [1]
[1]
[Tip]Tip

Although from a mathematical perspective, integers can be monoids in two different ways, we can't write two differing Monoid instances for Int in Haskell: the compiler would complain about duplicate instances.

In those rare cases where we really need several Monoid instances for the same type, we can use some newtype trickery to create distinct types for the purpose.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
                   
newtype AInt = A { unA :: Int }
    deriving (Show, Eq, Num)

-- monoid under addition
instance Monoid AInt where
    mempty = 0
    mappend = (+)

newtype MInt = M { unM :: Int }
    deriving (Show, Eq, Num)

-- monoid under multiplication
instance Monoid MInt where
    mempty = 1
    mappend = (*)

We'll then get different behaviour depending on the type we use.

ghci> 2 `mappend` 5 :: MInt
M {unM = 10}
ghci> 2 `mappend` 5 :: AInt
A {unA = 7}

We will have more to say about difference lists and their monoidal nature in the section called “The writer monad and lists”.

Want to stay up to date? Subscribe to the comment feed for this chapter, or the entire book.

Copyright 2007 Bryan O'Sullivan, Don Stewart, and John Goerzen. This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License. Icons by Paul Davey aka Mattahan.