Haskell: Concatenation vs. Prepending

This confused me a little until I grokked it. In Haskell, an item is not the same as a list containing the one item. (Obvious, I know, but some languages like XQuery take the opposite view).

The : operator prepends an item to the beginning of a list. It can only be used to insert an item at the beginning of a list. It cannot be used to append an item to the end of a list. Thus x:xs is correct but xs:x is a syntax error. (Haskell’s design seems stuck with assumptions going all the way back to Lisp and 1950s era processors including that the singly linked list is a good implementation for the list abstract data type. This has other consequences. For instance, length is an O(n) operation. And now that I look for it, insert into the middle of the list also seems to be missing.)

Lists are concatenated with the ++ operator, not :.

So, to sum up:

  • Use : to insert an item at the beginning of a list.
  • Use ++ to join two lists together.
  • To append an item to a list, insert the item into an empty list, and then concatenate the two lists together.
  • More generally, to insert an item into a list at any position other than the first:
    1. Split the list into two sublists at the insertion point with splitAt.
    2. Use : to prepend the new item to the second list.
    3. Concatenate the two lists back together with ++

Am I missing something obvious here?

7 Responses to “Haskell: Concatenation vs. Prepending”

  1. John Cowan Says:

    Linked lists make a lot of sense in Haskell, where they are immutable, and in Lisp, where they are often used immutably. Array-based lists don’t, because they depend on imperative changes, and Haskell is pure. If you want arrays in Lisp, you know where to find them. Linked lists also have the advantage that they can share tails.

    Off-topic: There are two errors in your DeveloperWorks article on Java math, part 2. NaN does not, in fact, violate the law of the excluded middle. NaN == NaN is true, NaN != NaN is false, NaN == 1.0 is false, and NaN != 1.0 is true, just as classical logic would predict. What is now false is the law of trichotomy, which says that just one of a < b, a == b, and a > b is true; if either a or b or both are NaN, then a < b is false and so is a > b.

    The other problem is far more serious. Your discussion of loops like for (double x = 0.0; x < 10.0; x += 0.1) strains out a gnat and swallows a camel. The real WTF here is not that the values sometimes come out non-integral when you expected them to be integral, but that this loop iterates 101 times! Never, never, but never iterate using a floating-point loop counter, as you will often get too many iterations. Iterate with an integral counter of appropriate size and then do the floating-point math anew in each iteration. (If the integers are too big, you may get some identical results in successive iterations.)

  2. Elliotte Rusty Harold Says:

    The law of the excluded middle does not apply with NaN:

    $ more Test.java
    public class Test {
     
       public static void main(String[] args) {
          double x = 0.0/0.0;
          System.out.println("x == x: " + (x == x));
          System.out.println("x != x: " + (x != x));
       
       }
    
    }
    $ javac Test.java
    $ java Test
    x == x: false
    x != x: true
  3. Michael Day Says:

    The special syntax provided for lists can obscure the simplicity of their definition. When learning a functional programming language it can be worth defining the list type yourself and working through some examples of common functions such as append, map, folds, and so on:
    data MyList a = Nil | Cons a (MyList a)
    myapp Nil ys = ys
    myapp (Cons x xs) = Cons x (myapp xs ys)Another thing to keep in mind is that functional languages go to some lengths these days to optimise away the creation of lists when they can. For example, if you have one function that returns a list of values, and another function that sums them up, the compiler may be able to optimise away the actual creation of the list itself and just update a single accumulator value. Ideally this results in simpler, decoupled code, that still performs well.For situations when you really do need random access to a sequence of values, most languages will provide some kind of array type, but the convenience of lists for functional programming make them the first thing people usually reach for when faced with a problem. Recently we have been optimising Prince, and that involved much careful examination of the way we were manipulating lists of characters and glyphs. (Prince is written in Mercury, not Haskell, but the list types are identical, only strict instead of lazy, which is going to require another comment to discuss :)

  4. Russell Says:

    Haskell lists are best used for instances where they are traversed once and then discarded. If you want to do some more serious manipulations, you need to use a different data type. I recommend Data.Sequence.Seq which can be downloaded from hackage.

  5. Cale Gibbard Says:

    A good way to think of lists in Haskell is that they are loops which have not yet happened (or if you prefer, the indices over which a loop will occur). If you want random access, lists are the wrong datatype. But just as loops are common in imperative programming, lists are common in Haskell.

    A loop either has no iterations, or it consists of a single iteration followed by another loop.

    A list either has no elements, or it consists of a single element followed by another list.

    This view is made more relevant by the fact that the tail of a list might not be computed yet, instead remaining an expression which will be evaluated when needed. Computations which only need one element of a list at a time can run in constant space, generating and consuming the list at the same time.

  6. John Cowan Says:

    I was wrong to say that NaN == NaN, but you were wrong to say that that violates the law of the excluded middle, which simply says that all truth claims are either true or false, tertium non datur (there is no third case). In fact NaN != NaN, which means that Java == is not a reflexive relation and therefore does not count as an identity relation, because identity relations must be reflexive (x equals x), symmetric (if x equals y, y equals x) and transitive (if x equals y and y equals z, then x equals z). Fortunately, it is still true to say in Java that whenever == is true, != is false.

    SQL equality does violate the law of the excluded middle, because 10 = 10 is true and 10 = 11 is false, but 10 = null is neither true nor false.

  7. Steve Gunnell Says:

    I have to agree with Elliotte that I can’t understand why Haskell thinks the x:xs construct is the one true way. It seems too implementation bound to Lisp. I can understand why indexing causes purity problems but surely an xs:x constructor is just as valid as x:xs?

Leave a Reply