After a busy week for me and my co-authors, I'm happy to share three new paper drafts.
- José Pedro Magalhães and Andres Löh. Hierarchy in Generic Programming Libraries. Submitted to Mathematics of Program Construction (MPC 2015). [ pdf ]
Generic programming (GP) is a form of abstraction in programming languages that serves to reduce code duplication by exploiting the regular structure of algebraic datatypes. Several different approaches to GP in Haskell have surfaced, giving rise to the problem of code duplication across GP libraries. Given the original goals of GP, the is a rather unfortunate turn of events. Fortunately, we can convert between the different representations of each approach, which allows us to "borrow" generic functions from different approaches, avoiding the need to reimplement every generic function in every single GP library.
In previous work we have shown how existing GP libraries relate to each other. In this paper we go one step further and advocate "hierarchical GP": through proper design of different GP approaches, each library can fit neatly in a hierarchy, greatly minimizing the amount of supporting infrastructure necessary for each approach, and allowing each library to be specific and concise, while eliminating code duplication overall. We introduce a new library for GP in Haskell intended to sit at the top of the "GP hierarchy". This library contains a lot of structural information, and is not intended to be used directly. Instead, it is a good starting point for generating generic representations for other libraries. This approach is also suitable for being the only library with native compiler support; all other approaches can be obtained from this one by simple conversion of representations in plain Haskell code.
- Jeroen Bransen and José Pedro Magalhães. Generic Representations of Tree Transformations. Journal version of the WGP'13 paper, submitted to Journal of Functional Programming. [ pdf ]
Applications which deal with editing of structured data over multiple iterations, such as structure editors, exercise assistants, or applications which support incremental computation, need to represent transformations between different versions of data. A general notion of "transformation" should be more informative than what is obtained by computing the difference between the old and the new term, as diff algorithms generally consider only insert, copy, and delete operations. Transformations, however, may involve swapping elements, or duplicating them, and a good representation of transformations should not involve unnecessary repetition of data, or lose shared structure between the old and new term.
In this paper we take a detailed look at the notion of transformation on strongly-typed structures. Our transformations are datatype-generic, and thus can be applied to a large class of data structures. We focus on representing transformations in a way that maximally captures the common substructure between the old and new term. This is of particular importance to a specific class of applications: that of incremental computations which recompute information only for the parts of the tree that are affected by a transformation.
We present a library for encoding such transformations over families of datatypes, together with an algorithm for computing a transformation from one term into another, while retaining shared common substructures. We provide practical examples of how to encode transformations, as well as a realistic application to computing transformations between different revisions of a program.
The most widely used generic-programming system in the Haskell community, Scrap Your Boilerplate (SYB), also happens to be one of the slowest. Generic traversals in SYB are often an order of magnitude slower than equivalent handwritten, non-generic traversals. Thus while SYB allows the concise expression of many traversals, its use incurs a significant runtime cost. Existing techniques for optimizing other generic-programming systems are not able to eliminate this overhead.
This paper presents an optimization that eliminates this cost. Essentially, it is a partial evaluation that takes advantage of domain-specific knowledge about the structure of SYB. It optimizes SYB traversals to be as fast as handwritten, non-generic code, and benchmarks show that this optimization improves the speed of SYB traversals by an order of magnitude or more.