GSoC 2017 - Purescript/Javascript interop for a genome browser

Posted on July 6, 2017 by Christian Fischer

Last year’s Google Summer of Code, I worked on embedding the Biodalliance (BD) genome browser in GeneNetwork 2. More info about that can be found in this blog post. This year, again with the support of the Open Bioinformatics Foundation, I am writing a genome browser in Purescript, which also uses BD. This is one of the reasons Purescript was chosen, as it has excellent support for interop between PS and JS.

The goal of this new browser is, in short, to show GWAS, QTL, and Cytoscape data, in a way that allows for exploration of the various data in terms of one another.

This post will give an overview of what the project looks like, as well as describe the adapter libraries from Purescript to Biodalliance and Cytoscape.js. The next post will concern the new browser as a whole.

What we need

The browser will (initially) consist of a track-based genome browser as well as graph network viewer. The two parts will be communicating with each other, and be manipulated from a controlling component.

To minimize the need to write things from scratch, BD is used as the genome browser (however a Purescript-based browser is planned), and Cytoscape.js (Cy.js) as the graph network viewer. Both BD and Cy.js are written in Javascript, exposing JS APIs, yet I want to use them from Purescript, and integrate them within a Purescript system. This is where Purescript’s Foreign Function Interface (FFI) comes in, as it makes it easy to call Javascript from Purescript and vice versa.

The rest of the post describes the JS APIs of both BD and Cy.js, followed by an overview of the APIs that have been written to interface them with PS.

Biodalliance

Biodalliance is a powerful Javascript-based genome browser, used by e.g. Sanger here: http://www.uk10k.org/dalliance.html. It can read many different data formats, and can be configured to display data in various ways. It also has a UI for manipulating the viewport, adding/removing tracks, exporting the view as various formats, and more.

For this project, only the track browser itself is interesting, as the main PS app will take care of the rest. To do this, we can use BD’s API.

BD API

BD exposes a JS API to do things such as move the view across the genome, add/remove tracks, and add event listeners so functions can be called when a track feature is clicked on.

We want to let PS code control the viewport and add callbacks to deal with events. The BD tracks are also configured from Purescript, using a simple wrapper over BD’s JSON configuration system, which I won’t go into here.

I have also written an interface to allow for defining glyphs and renderers in Purescript, simplifying the process of customizing the look of a track. However that is for another blog post.

The BD/Purescript interface

The PS interface consists of functions for creating a BD browser instance, adding callbacks for when the browser is created and when a feature is clicked, as well as for moving the view.

Let’s start with the types:

foreign import data Biodalliance :: Type
foreign import data BD :: Effect

The `Biodalliance` type represents a Biodalliance browser, and can only be created via the FFI function wrapping the BD browser constructor. The `BD` effect is used to label all BD-related effects in Eff, for example creating a new instance, or scrolling an existing instance.

Creating the BD browser is not particularly interesting; it only takes the BD configuration and constructor as arguments, and returns the result of calling the constructor with the configuration.

Speaking of scrolling, here is the definition for moving the BD view to some position, given a chromosome, and a minimum and maximum position in basepairs:

foreign import setLocationImpl :: ∀ eff.
                                  Biodalliance
                                -> Chr -> Bp -> Bp
                                -> Eff (bd :: BD | eff) Unit

setLocation ::  c eff. HCoordinate c =>
                Biodalliance
            -> Chr -> c -> c
            -> Eff (bd :: BD | eff) Unit
setLocation bd chr xl xr = setLocationImpl bd chr (bp xl) (bp xr)

Now, `HCoordinate` is a typeclass for anything that can be converted to basepairs or megabasepairs, so while BD uses Bp as units, the PS wrapper function `setLocation` doesn’t care, and converts automatically.

Here is the FFI definition of `setLocationImpl`:

exports.setLocationImpl = function(bd) {
    return function(chr) {
        return function(xl) {
            return function(xr) {
                return function() {
                    bd.setLocation(chr, xl, xr);
                };
            };
        };
    };
};

It looks like it does because PS only deals with curried functions, so FFI wrappers must also be curried (there are libraries that make this more pleasant to deal with). Other than that it is straightforward.

Cytoscape.js

Cytoscape.js is a graph network analysis and visualization library, and is a Javascript-implementation of the Cytoscape platform (link). We use it for visualizing graph-based data and networks.

Cy.js API

Cy.js exposes a powerful Javascript API, letting programmers work with the whole graph, subsets of it, filter collections, control the viewport, and more. Currently it is used for manipulating the contents of the graph by filtering the nodes and edges.

The Cy.js API is divided into types, where some functions are associated with the entire graph, a single element, a collection of edges, and so on. There are also layout and animation types.

The Cy.js/Purescript interface

As even the subset of the Cy.js API that we’re currently interested in is much larger than the BD API, I’ll only go into a fairly small subset of the PS API here.

Types

The first two types are exactly analogous to those in the BD API. The rest of the `foreign import data` types are used to work with elements (either nodes or edges), events (as sent directly from Cy.js) and collections (which can currently only be created as containing Elements, despite the type signature).

foreign import data Cytoscape :: Type
foreign import data CY :: Effect

foreign import data Element :: Type
foreign import data CyEvent :: Type

foreign import data CyCollection :: Type -> Type

newtype Layout = Layout String

The `Layout` type is a newtype, as it is sent directly to the Cy.js functions which take a layout name as parameter — to JS, it is simply a String:

circle :: Layout
circle = Layout "circle"

foreign import runLayout :: forall eff.
                            Cytoscape
                         -> Layout
                         -> Eff (cy :: CY | eff) Unit
exports.runLayout = function(cy) {
    return function(layout) {
        return function() {
            cy.layout({name: layout}).run();
        };
    };
};

Collections

Most of the PS API, as it is now, deals with collections. It’s important that they be easy to work with from Purescript, even though they’re basically opaque, and must be manipulated via the Cy.js API. Even so, some useful abstractions have been made, and it is easy to work with the filtering system.

For example, `CyCollection` is a Semigroup, with the binary operation being taking the union of the collections:

instance semigroupCyCollection :: Semigroup (CyCollection e) where
  append = union
exports.union = function(a) {
    return function(b) {
        return a.union(b);
    };
};

The `filter` function is defined as follows, together with a couple of FFI-defined predicates:

foreign import filter :: forall e.
                         (e -> Boolean)
                      -> CyCollection e
                      -> CyCollection e

foreign import isNode :: Element -> Boolean
foreign import isEdge :: Element -> Boolean

Predicates are also easy to write, as there is a function `elementJson :: Element -> JObject`, where `JObject` is a simple map from keys to JSON values. For example, a predicate that returns true if the ID of an element is even:

  evenId el :: Element -> Boolean
  evenId el = case (elementJObject el) .? "id" of
    Left _  -> false
    Right i -> i `mod` 2 == 0

In Purescript, any function of the type (a -> Boolean) is an instance of the `HeytingAlgebra` typeclass, which is basically anything Boolean-esque, having operators &&, ||, and some others. This lets us combine predicates easily. Here is one that is true for nodes with even IDs, filtering away all edges, and one that returns even edges and odd nodes:

evenNodes = evenId && isNode
evenEdgesOddNodes = (evenId && isEdge) || ((not evenId) && isNode)

Combining the Semigroup with the filtering, we can make a complicated no-op:

coll :: CyCollection Element

edges = filter isEdge coll
nodes = filter isNode coll

coll == edges `append` nodes -- > true