GSOC progress update #3: Pursuit deployed!
The most significant development since my last entry is that Pursuit has been deployed! See the announcement post I wrote.
Things that have changed since the last update:
Hoogle integration is now set up and working. The way it works currently is:
- We go over every package in the database, select the latest version of each, and generate a Hoogle input file from each one.
- We concatenate all of these into one massive input file.
- We pass this to the
createDatabasefunction exported by the Hoogle library to create a Hoogle
- We periodically repeat the above 3 steps, and store the resulting
TVarso that later requests can access it.
- When a query comes in, we parse it (again with the Hoogle library),
perform a search, and then convert the
Hoogle.Resultvalues we get back into a slightly different format, which makes it a bit easier to do what we want to do with them,
- Finally, we turn this data into either JSON or HTML (based on the request’s Accept header) and send it back in the response.
It’s still quite basic and has a few obvious flaws (notably, rows look horrendous in the search results, and also searching for rows fails with a syntax error) but it does work.
It seems as though Hoogle 5 will have an improved API which will make this process a lot less error prone, and hopefully faster, too, as well as improving many of the current deficiencies of Pursuit’s Hoogle search interface, so I think we will look at upgrading very soon after it is released.
The caching system has been replaced with a simpler one: when a cacheable page is served, the Yesod application will also write a file with the same contents as the response, to a location on the disk mirroring the current URL path. Then, a proxy server in front of the Yesod application (for us, nginx) can check for the existence of such a file, and serve it if it exists, with appropriate caching headers. The server knows when any cached file should expire, since this only happens when new versions of things get uploaded, and simply deletes cached files where necessary.
We got rid of the old 2-step upload-then-verify process, in favour of a one-step process that requires authentication at the time of upload (using GitHub OAuth, as before). It’s simpler, and we don’t have to worry about generating secure verification links, or the disk filling up with unverified, uploaded packages.
psc-publish now has a
--dry-run option, which doesn’t require the current
commit to be tagged, and doesn’t produce any output. The intended use for this
flag is that you run
psc-publish --dry-run just before tagging a version, to
check that it is possible to produce Pursuit data. If it works, you go ahead
and tag the version, run
psc-publish for real, and upload the result to
Soon, I intend to contribute a feature to
pulp which performs the above for
you, as well as bumping the version and tagging a commit for the release.
Uploading packages to Pursuit directly from the command line should be
possible; the only difficult part is the authentication (which is not that
The compile-time code is all gone and replaced with configuration via environment variables at runtime. This is nice because it means that we’re free to compile a binary for production in any environment; on my machine, on Travis, on the server, wherever.
Prim types, such as
Int, now work. All that was
needed was a pretend
Prim module (which I created, see
https://github.com/hdgarrood/purescript-prim), and a couple of special cases
added inside the HTML documentation rendering module.
A keyboard shortcut for the search box: just hit S and the search box is focused. This idea was shamelessly copied from Rust’s equivalent tool, and I think it adds a really nice touch.
I added some basic client-side code which validates packages just before you upload them, and also which displays the package name and version next to the form, to reassure you that you selected the right JSON file.
I made some CSS adjustments and media queries, so that Pursuit is usable with pretty much any device width.
Also, various polishing; things like making 404 errors nicer, adding a help page, and adding an upload link to the top banner.
Content Security Policy headers, to mitigate the risk of XSS. We already sanitize HTML, so in theory, the CSP is not necessary. In practice, of course, it’s good to have multiple layers of defence. Now, inline scripts are prevented from running at all on browsers that support CSP in response headers (which I think is most of them).
Raw HTML in Markdown code documentation is now disallowed as well.