Monday, January 20, 2014

How to use Haddock

Haddock, Haskell's documentation generator of choice, is the de-facto means of communicating the purpose of a module or function in the Haskell ecosystem. Despite this, it not included in Learn you a Haskell, and is only mentioned in passing in Real World Haskell. While those books don't make mention of it, it's as crucial to be able to write Haddock as it is to be able to write Javadoc, or doxygen. Because GHC is exposed as a library, Haddock is able to determine the type signature of your library's functions by parsing its code. So, what is left up to the programmer?

Documenting a Module
The first step is to document the module you're working on.  In the picture shown, the chunk of information Portability, Stability, and Maintainer all come from the Module documentation, as do the Module, Copyright, and License fields.  This is generated by a multi-line comment with the special Haddock character, "|", in the front position. That's a pipe, not a 1 or l.

The multi-line comment to generate that information in the document is:
{- |
   Module      :  OpenSSL.Digest.ByteString
   Copyright   :  (c) 2010 by Peter Simons
   License     :  BSD3

   Maintainer  :
   Stability   :  provisional
   Portability :  portable

   Wrappers for "OpenSSL.Digest" that supports 'ByteString'.
The fields are on the left side, separated by a colon and the r-value for the field. Stability and Portability are subjective. If your API for the module is subject to change frequently, or you know that it will change in the foreseeable future, mark this field as 'Unstable'. If your module depends on other Haskell code not included in a default ghc installation, a Foreign Function Interface, or a feature only found in a very recent or very old version of ghc, mark your module as Non-portable, and possibly give a reason why. For instance, in one of my modules, I have the line "Portability :   Portable (standalone - ghc)", meaning that it does not rely on any other modules within its own namespace, nor any module not included in a default ghc installation.

Documenting your Function, Typeclass, Method, Instance, etc.
This is the last type of documentation you need to add to your Haskell module to bring its documentation up to speed. Single-line comments above functions, usually separated by a newline from the function's type signature (if included), and preceded immediately after the comment (--) by a pipe (|), are Haddock comments.

Take this trivial function:

rev :: [a] -> [a]
rev [] = []
rev (x:xs) = rev xs ++ [x]

ghc already knows the type signature is [a] -> [a], because you told it. For good measure, it'd know even if you didn't; type inference. Because of this, there's no need to talk about the types of data a function can receive. You need only discuss the purpose of the function. The function rewritten with a haddock comment:

-- | Reverses a finite list. 
rev :: [a] -> [a]
rev [] = []
rev (x:xs) = rev xs ++ [x]

Now you see that it's really super easy (and, dare I say, fun?) to document your Haskell code. However, some people believe that "Good code speaks for itself", and that that idiom holds especially true for ML derivatives with type signatures. In fact, superfluous comments will actually clutter your code more than anything. What's the etiquette for commenting?

I will leave that "as an exercise for the reader", as many Haskell bloggers often do for various topics. My personal preference is to always include Haddock comments in code that will be parsed and documented for me anyway. If you maintain a package on Hackage, the popular repository for Haskell, put a bit of Haddock in your code, even for self-explanatory functions. If you're just going to upload it to github and forget about it? There's no point in using Haddock, unless you're anal about good documentation. Use good judgement on when and when not to use this powerful tool.

Sunday, January 19, 2014

Using Monads in Haskell

Haskell is a bit of a quirky language, to say the least. One of the most talked about features of Haskell is the monad, which is a minimal context that essentially "wraps" a value. Monads are not a part of Haskell itself, but rather a ubiquitous use of the type system. All monads must obey the three Monadic laws of Left Identity, Right Identity, and Associativity. Academics can argue these laws all day, and with good reason, but what about a real programmer more interested in getting real work done than writing proofs for their code? This is a quick run-down to teach those types how they can use monads in their code.

Do Notation
The main way to handle monads is do-notation. It can "unwrap" monads easily, with do and arrows. Here is an example with the IO monad, inside of the main function.

main = do
  fileHandle <- openFile "test.txt" ReadMode
  x <- hGetContents fileHandle
  let x-lines = lines x
  return ()

In this block of code, the Handle is "extracted" from the IO Monad in the second line of code. <- will take a Monad, and bind its wrapped value to a named value. Note that this is still lazily evaluated. Afterwards, a new String is taken from an IO (String) and put into x. The next important line, "return ()", uses one of monad's core methods. This is because main must return an IO monad. However, you can see that it is inefficient to bind so many variables. So, here is how to use another one of Monad's methods, >>=, to chain actions together.

main = do
  xlines <- openFile "test.txt" ReadMode >>= hGetContents
  return ()

This is such an essential usage of Monads that >>= is the unofficial symbol representing Haskell. As you can see, the result of the openFile command is "pushed" into hGetContents. It's important to note that a monad is going into hGetContents, and a monad is coming out. <- is what unbinds it.

Functors and Monads
So, you still have to use the <- to "de-monad" the monad you get before it can be used, right? Wrong. There is a typeclass called fmap, which is generally applicable to everything. One of the monad laws dictates the behavior of fmap, so it damn well better be able to manipulate our monads. Here is the program once again rewritten:

main = fmap putStrLn $ openFile "test.txt" ReadMode >>= hGetContents

Here, fmap is a function with the type signature of "String -> IO ()", applied to the result of hGetContents on the monad pushed into it by the openFile. fmap maps putStrLn over the results, and the IO is returned as a result. Nifty? Nifty.

So, the last piece of the puzzle is this: You have multiple monads, and you want to add the result of them without binding temporary variables. This can be done with Applicative Functors. They are like Functors, but with some added goodies. To use Applicative functors, include Control.Applicative and test out this chunk of code:

main = fmap putStrLn $ pure (++) <*> (openFile "test.txt" ReadMode >>= hGetContents) <*> (openFile "test2.txt" ReadMode >>= hGetContents)

This will take the ++ function, turn it into one which can work with Applicatives, and then apply it partially over the <*> elements. If the function can already handle Applicatives, then <$> can be used instead.

In Conclusion
Monads were a tricky subject for me for a long time. Most of my programs look like my first chunk of code, where I did one monadic operation per line. However, learning to master Monads in Haskell to control side-effects and efficienty keep track of your state is of utmost importance in commanding the language. So go on, don't pick up that monad. Wear that >>= with pride.