A story of adding the two together
Published on
Once upon a time I was designing a report system for work. I was able to generate reports based on what we needed to look at on a daily basis using Racket. Writing HTML from Racket is pretty easy, so I spawn some very basic web pages using an xexpr->file
function, which converts an S-expression into a nested XML doc.
Things were going good until I realized I had a common pattern in all my report programs: I was using map
and filter
a lot. Eventually I was going to run into a performance problem since using both would result in double-iteration. Both functions are O(n)
since the default structure in Lisp is a linked list.
Then I started to wonder, was there any way to mix the two functions with little performance overhead? Or were there any better ways of doing what I wanted to do?
map
and filter
My first naive implementation would look like this, before I started to worry about performance overhead.
#lang racket/base
(define (mapfilter m f d)
(map m (filter f d)))
In my reports, I serialize a lot of CSV records into structs, and when I want to write to HTML, I use a map
function to convert from struct
to a list of elements I would need for a report. But often times I use a filter
predicate function to select a list of results from my CSV, so I filter
my structs first, then I map
my conversion after.
So here's a before and after using my mapfilter
.
; before
(map CSV->ReportItem (filter valid-item? All-Items))
; after
(mapfilter CSV->ReportItem valid-item? All-Items)
A little bit easier on the eyes and a lot less nesting. It's less clear which function does what, but I can avoid nesting too many expressions and simplify my report writing a lot with this.
But after a while I started putting two and two together and realized this is where my double-iteration problem stems from. I needed a new approach.
I needed a new function that would do mapping and filtering, thus I devised a very linear recursing function which would do exactly that. It takes two different functions and the data list, same arguments as before, and would use a helper function internally to iterate over the list and collect results.
The main logic is as follows: if an element passes the predicate filter
function test, we then map
it to a new value and attach it to our results. If not, it gets skipped over entirely. Let's start putting that into code now.
; Take #2
; mapfilter :: (a -> b) -> (a -> Bool) -> [a] -> [b]
(define (mapfilter m f d)
(define (inner-mf m f data acc)
(if (eqv? '() data)
acc
(let ([cell (car data)])
(inner-mf m f (cdr data)
(if (f cell) (cons (m cell) acc) acc)))))
(reverse (inner-mf m f d '())))
Okay, so it looks a bit messier, but that always happens with functional programming and recursion rules. In order to take advantage of tail-call optimization, the function call must be last, so we try to squeeze some of this logic inside the function call of inner-mf
.
Using racket/trace
, we can see if our function passes a tail-call optimization test.
; test module
(module+ test
(require (only-in racket/list range))
(mapfilter add1 even? (range 5)))
; output from using `raco test -x .` and racket/trace
>(mapfilter #<procedure:add1> #<procedure:even?> '(0 1 2 3 4))
<'(1 3 5)
'(1 3 5)
trace
shows us the stack trace of a user function, so our mapfilter
does indeed work in a single stack call and is being tail-call optimized.
At first my data set did not seem to yield any massive benefits. I tried using time
to time my Racket program, and it seemed that my dataset was not large enough to really take advantage of these new codepaths unfortunately. Maybe I'm premature optimizing.
Lastly, I had one more idea in my book, but it wouldn't be for performance and more for sanity.
We've covered what map
does and we've covered what filter
does, but there's a third category of list processing that I have not brought up: reduce
.
map :: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [a]
reduce :: (a -> a -> b) -> b -> [a] -> b
reduce
is a function with other names, but the idea is that it serves the purpose of "reducing" a list into a single data-type of some sort. Think of it as a sum
function: it takes the +
operator and throws all elements of a list at it until it reaches a final number, our sum.
(define our-sum (+ 1 2 3 4 5))
(define our-other-sum (apply + '(1 2 3 4 5)))
(define our-very-other-sum (+ 1 (+ 2 (+ 3 (+ 4 5)))))
The +
function naturally accepts a list of numbers, just as much as -
and the other math operators tend to. But they didn't always start like that, they work only as binary operators; meaning they only take two at a time.
Somehow, Racket finds a way to compress our list of numbers into a series of binary arguments to a function. If our number list was a long piece of paper from a printer with every number printed on it, Racket has a way of "folding" this very long piece of paper into something that resembles a series of additions or subtractions. We call this folds, both in Racket as foldl
and foldr
.
For example, let's try creating the sum
function using foldl
.
(define (sum listof-nums)
(foldl + 0 listof-nums))
(sum '(1 2 3 4 5))
We supply an initial value to foldl
which acts as the first value in our folding. We can visualize how this will work by writing it out.
(foldl + 0 '(1 2 3 4 5))
(foldl + (+ 0 1) '(2 3 4 5))
(foldl + (+ 2 (+ 0 1)) '(3 4 5))
(foldl + (+ 3 (+ 2 (+ 0 1))) '(4 5))
(foldl + (+ 4 (+ 3 (+ 2 (+ 0 1)))) '(5))
(foldl + (+ 5 (+ 4 (+ 3 (+ 2 (+ 0 1))))) '())
(foldl + (+ 5 (+ 4 (+ 3 (+ 2 1)))) '())
(foldl + (+ 5 (+ 4 (+ 3 3)))) '())
(foldl + (+ 5 (+ 4 6)) '())
(foldl + (+ 5 10) '())
(foldl + 15 '())
> 15
We don't really want to use foldr
because it traverses the list in the opposite order. It starts from the last element of the list and builds up, which would use more space and is less efficient overall.
The big take-away is that foldl
can do iteration over lists for us and then "reduce" it into a data-type that makes sense for us. It doesn't have to be number, it can even be lists. We can even recreate map
and filter
using foldl
.
(define (my-map m items)
(define (reducer item acc)
(cons (m item) acc))
(reverse (foldl reducer '() items)))
(define (my-filter f lst)
(define (reducer item acc)
(if (f item) (cons item acc) acc))
(reverse (foldl reducer '() items))))
So cool, foldl
can re-create map
and filter
, but if we start to toy with the code a little bit, we get something that would merge map
and filter
into one brand new function: mapfilter
.
#lang racket/base
; Take #3
(define (mapfilter m f items)
(define (reducer item lst)
(if (f item) (cons (m item) lst) lst))
(reverse (foldl reducer '() items)))
; test
(module+ test
(mapfilter add1 even? '(1 2 3 4 5))
; output
'(3 5)
Now that's a bit of an improvement. Far less gross recursion handled by myself and letting foldl
take the load off.
What we have just done is take the burden of manually extracting items from a list off our chest and place it on foldl
's shoulders instead. Our initial value is an empty list, and if our current item passes our filter
predicate test, then we can map
it and attach it to the head of the list.
It looks a lot cleaner and certainly isn't very hard to read. It's nice and simple and I'm pretty happy with it.
It's not a very large problem or maybe it's a little obvious if using a non-functional language, but I had fun investigating and figuring out how to better use my available Racket functions to clean up some what was originally very messy code. I have yet to really time the differences and impacts of each function so maybe there's overhead I'm missing somewhere, but this was a fun little dive to say the least.
As a side note, there is a function called filter-map
, but it does not resemble the functionality that I would need. 😥