Down with Show! Part 2: What's wrong with the Show type class

This is part two of three in a series where I will argue that it is time to consign the Show type class to the dustbin of history. In the previous post, I discussed a few rules of thumb which I use to help me decide whether or not introducing a new type class is a good idea. In this post, I will discuss the Show type class itself, and where I think it falls short.

We’ll start by reminding ourselves what the class looks like (at least in PureScript):

class Show a where
  show :: a -> String

Haskellers may be wondering why I’ve missed out showsPrec and showList: the answer is that they’re not really relevant to this discussion.

What is Show for?

I can think of at least three distinct situations where you might want a function of the type T -> String for some type T:

While the Show class does, on occasion, get used for all of these three things, I’m going to argue that it’s not particularly good at any of them.

Firstly, it’s important to recognise that these three purposes are actually distinct; although a single function might suffice for all three purposes in the cases of many basic types such as Int or Bool, this is not likely to remain true once you start dealing with more complex types, and particularly not for those types which are built out of sums and products (and you inevitably will meet these sorts of types as soon as you start e.g. building an application).

If you know Python, you might be aware of the difference between __str__ and __repr__, which sort of goes along the same lines as the above. I think Python was right to separate these functions: __repr__ is designated as being for debugging, whereas __str__ is for ‘nicely printable’ string representations of objects, which is perhaps closer to what I’ve described as displaying values.

Let’s look at each of these purposes in turn, and how effective the Show class is in fulfilling them.

Serialization

Contrary to what Real World Haskell: Chapter 6 argues, I would say that Show should not be used for serialization (and I don’t expect this to be very controversial). If you want to interface with any software at all not written in Haskell, Show is going to be much more awkward than, say, JSON. If you do know you’re only ever going to communicate between other Haskell programs using the same data definitions, binary formats are likely to be a more efficient alternative.

Display

To address this use case, I am going to refer back to the rules of thumb for use of type classes from the previous post.

I would argue, therefore, that displaying values as strings for use in e.g. a user interface is a good example of a situation where regular functions are more appropriate than a type class.

Debugging and the repl

So we’ve established that Show should not be used for serialization or display, and we have seen better alternatives for both of these purposes. That just leaves debugging and the repl to deal with.

Currently, in both Haskell and PureScript, the repl uses the Show type class by default. Suppose I type something into the repl which evaluates to a type T. If there is a Show T instance, then the result of showing whatever my expression evaluates to is printed. Otherwise, I get an error along the lines of No instance for (Show T).

From the perspective of showing values in the repl, the first major drawback of Show is that lots of types don’t have instances. This is perhaps a result of it trying to do too much at once.

Any function being used for serialization must be injective, which means that serializing two distinct values should give you two distinct results: this is crucial if we want other programs to be able to accept what we produce and reconstruct the same value.

If Show was originally intended to be used for serialization, then show of course would have needed to be injective for every instance. However, this rules out instances for a lot of types. For example, we can’t write an injective Show instance for functions a -> b, unless we know that a is only inhabited by finitely many values (because we need to check what our function does to each of these values). Even then, the output is very unlikely to be useful unless the type a has a very small number of inhabitants. But some of the most common types we use have a large number of inhabitants, or even infinitely many! Consider e.g. Int, Integer, or [a].

There are other examples, too. For example, there are a few types which we can’t do anything with unless we move into IO. The type IO itself is one: we can’t write an injective function IO () -> String, because we (rightly) have no tools to introspect a value of type IO (). Similarly, we can’t write an injective function IORef Int -> String. We can’t even read the value of the IORef at the time the function is evaluated and turn that into a string for use in our return value, unless we cheat and use something like unsafePerformIO, because we have to produce a String, not an IO String.

The fact that so many types don’t have instances is a significant hindrance to the purpose of seeing representations of values in the repl. You might argue that showing functions in the repl would be useless anyway, because what would we produce? However, it is very common to come across types which are products or records where one field is a function, or an IO T, or any other type which lacks a Show instance. In these cases, we can’t derive a Show instance, so if we want to be able to see values of this type in the repl, we would have to manually write a Show instance which skips over the problematic fields, and this is quite tiresome; in practice, we often don’t bother. Another option is to use orphan instances for problematic fields. For example, there is an orphan Show (a -> b) instance in the Haskell module Text.Show.Functions, which always produces the string "<function>". This is fine in languages which support orphan instances (note that PureScript does not), but far from ideal. One reason to avoid orphan instances is that you can’t opt-in to them on a per-module basis; if I use an orphan instance in a certain module, then everyone else who imports that module also has that instance in scope, whether they want it or not.

Another problem with Show is that, for larger types, the output is very difficult to extract useful information from — at least, it is if we’re using the derived instances. Here’s an example of what happens if I try to show a Pursuit package, for instance:

λ: latest <- handler getLatestPackages
λ: handler (uncurry lookupPackage (latest !! 1))
Right (Package {pkgMeta = PackageMeta {bowerName = PackageName "purescript-web-storage", bowerDescription = Nothing, bowerMain = [], bowerModuleType = [], bowerLicense = ["MIT"], bowerIgnore = ["**/.*","bower_components","node_modules","output","bower.json","package.json"], bowerKeywords = [], bowerAuthors = [], bowerHomepage = Just "https://github.com/purescript-web/purescript-web-storage", bowerRepository = Just (Repository {repositoryUrl = "git://github.com/purescript-web/purescript-web-storage.git", repositoryType = "git"}), bowerDependencies = [(PackageName "purescript-web-events",VersionRange {runVersionRange = "^1.0.0"}),(PackageName "purescript-nullable",VersionRange {runVersionRange = "^4.0.0"})], bowerDevDependencies = [], bowerResolutions = [], bowerPrivate = False}, pkgVersion = Version {versionBranch = [2,0,0], versionTags = []}, pkgVersionTag = "v2.0.0", pkgTagTime = Just 2018-05-25 14:05:13 UTC, pkgModules = [Module {modName = ModuleName [ProperName {runProperName = "Web"},ProperName {runProperName = "Storage"},ProperName {runProperName = "Event"},ProperName {runProperName = "StorageEvent"}], modComments = Nothing, modDeclarations = [Declaration {declTitle = "StorageEvent", declComments = Nothing, declSourceSpan = Just (SourceSpan {spanName = "/home/travis/build/purescript-web/purescript-web-storage/src/Web/Storage/Event/StorageEvent.purs", spanStart = SourcePos {sourcePosLine = 21, sourcePosColumn = 1}, spanEnd = SourcePos {sourcePosLine = 21, sourcePosColumn = 41}}), declChildren = [], declInfo = ExternDataDeclaration (NamedKind (Qualified (Just (ModuleName [ProperName {runProperName = "Prim"}])) (ProperName {runProperName = "Type"})))},Declaration {declTitle = "fromEvent", declComments = Nothing, declSourceSpan = Just (SourceSpan {spanName = "/home/travis/build/purescript-web/purescript-web-storage/src/Web/Storage/Event/StorageEvent.purs", spanStart = SourcePos {sourcePosLine = 23, sourcePosColumn = 1}, spanEnd = SourcePos {sourcePosLine = 23, sourcePosColumn = 41}}), declChildren = [], ...

The node.js repl handles cases like this one a lot better by using pretty-printing, and by omitting details beyond a certain depth:

> require('tls')
{ CLIENT_RENEG_LIMIT: 3,
  CLIENT_RENEG_WINDOW: 600,
  SLAB_BUFFER_SIZE: 10485760,
  DEFAULT_CIPHERS: 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:DHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA256:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA',
  DEFAULT_ECDH_CURVE: 'prime256v1',
  getCiphers: [Function],
  convertNPNProtocols: [Function],
  convertALPNProtocols: [Function],
  checkServerIdentity: [Function: checkServerIdentity],
  parseCertString: [Function: parseCertString],
  createSecureContext: [Function: createSecureContext],
  SecureContext: [Function: SecureContext],
  TLSSocket: { [Function: TLSSocket] super_: { [Function: Socket] super_: [Object] } },
  Server: { [Function: Server] super_: { [Function: Server] super_: [Object] } },
  createServer: [Function],
  connect: [Function],
  createSecurePair: [Function: deprecated] }

This is a slightly unfair example because the tls module is less complex than the Package type in Pursuit, but hopefully my point is clear. The string [Object] indicates that there is an object there, but it hasn’t been printed in full because it’s too deep in the structure. If we want to see it, we can drill down deeper into the structure:

> require('tls').Server
{ [Function: Server]
  super_: 
   { [Function: Server]
     super_: 
      { [Function: EventEmitter]
        EventEmitter: [Circular],
        usingDomains: true,
        defaultMaxListeners: [Getter/Setter],
        init: [Function],
        listenerCount: [Function] } } }

To me, this is an indication that a function of the type T -> String is the wrong choice for debugging or for showing values in the repl. Ideally we would be using a type which provides a tree structure, to allow us to do things like cutting off pretty-printing once we reach a certain depth. Changing the class to use a tree structure instead of just String also has the fortuitous effect of making it less tempting to abuse the class for other purposes, such as serialization or display.

I first encountered the idea of using a tree structure instead of a String when @rightfold suggested it in a GitHub thread discussing this issue. This is a tremendously valuable insight; without it, I most likely never would have managed to come up with what I’m going to show you in the next post.

Summary

In this post, I’ve argued that the design of the Show class falls short on a number of fronts: most importantly, that it tries to do too much at once, and as a result, it’s not particularly good at any of the things it tries to do. I’ve also discussed its shortcomings from the perspective of showing things in the repl or for debugging purposes, and I’ve hinted at how we might be able to address these.

In the next post, I’ll talk in more detail about a design I’ve come up with in order to address these shortcomings.

Next up: Part 3: A replacement for Show