A story of adding the two together
Once upon a time I was designing a report system for work. I was able to generate reports based on what we needed to look at on a daily basis using Racket. Writing HTML from Racket is pretty easy, so I spawn some very basic web pages using an
xexpr->file function, which converts an S-expression into a nested XML doc.
Things were going good until I realized I had a common pattern in all my report programs: I was using
filter a lot. Eventually I was going to run into a performance problem since using both would result in double-iteration. Both functions are
O(n) since the default structure in Lisp is a linked list.
Then I started to wonder, was there any way to mix the two functions with little performance overhead? Or were there any better ways of doing what I wanted to do?
My first naive implementation would look like this, before I started to worry about performance overhead.
#lang racket/base (define (mapfilter m f d) (map m (filter f d)))
In my reports, I serialize a lot of CSV records into structs, and when I want to write to HTML, I use a
map function to convert from
struct to a list of elements I would need for a report. But often times I use a
filter predicate function to select a list of results from my CSV, so I
filter my structs first, then I
map my conversion after.
So here's a before and after using my
; before (map CSV->ReportItem (filter valid-item? All-Items)) ; after (mapfilter CSV->ReportItem valid-item? All-Items)
A little bit easier on the eyes and a lot less nesting. It's less clear which function does what, but I can avoid nesting too many expressions and simplify my report writing a lot with this.
But after a while I started putting two and two together and realized this is where my double-iteration problem stems from. I needed a new approach.
I needed a new function that would do mapping and filtering, thus I devised a very linear recursing function which would do exactly that. It takes two different functions and the data list, same arguments as before, and would use a helper function internally to iterate over the list and collect results.
The main logic is as follows: if an element passes the predicate
filter function test, we then
map it to a new value and attach it to our results. If not, it gets skipped over entirely. Let's start putting that into code now.
; Take #2 ; mapfilter :: (a -> b) -> (a -> Bool) -> [a] -> [b] (define (mapfilter m f d) (define (inner-mf m f data acc) (if (eqv? '() data) acc (let ([cell (car data)]) (inner-mf m f (cdr data) (if (f cell) (cons (m cell) acc) acc))))) (reverse (inner-mf m f d '())))
Okay, so it looks a bit messier, but that always happens with functional programming and recursion rules. In order to take advantage of tail-call optimization, the function call must be last, so we try to squeeze some of this logic inside the function call of
racket/trace, we can see if our function passes a tail-call optimization test.
; test module (module+ test (require (only-in racket/list range)) (mapfilter add1 even? (range 5))) ; output from using `raco test -x .` and racket/trace >(mapfilter #<procedure:add1> #<procedure:even?> '(0 1 2 3 4)) <'(1 3 5) '(1 3 5)
trace shows us the stack trace of a user function, so our
mapfilter does indeed work in a single stack call and is being tail-call optimized.
At first my data set did not seem to yield any massive benefits. I tried using
time to time my Racket program, and it seemed that my dataset was not large enough to really take advantage of these new codepaths unfortunately. Maybe I'm premature optimizing.
Lastly, I had one more idea in my book, but it wouldn't be for performance and more for sanity.
We've covered what
map does and we've covered what
filter does, but there's a third category of list processing that I have not brought up:
map :: (a -> b) -> [a] -> [b] filter :: (a -> Bool) -> [a] -> [a] reduce :: (a -> a -> b) -> b -> [a] -> b
reduce is a function with other names, but the idea is that it serves the purpose of "reducing" a list into a single data-type of some sort. Think of it as a
sum function: it takes the
+ operator and throws all elements of a list at it until it reaches a final number, our sum.
(define our-sum (+ 1 2 3 4 5)) (define our-other-sum (apply + '(1 2 3 4 5))) (define our-very-other-sum (+ 1 (+ 2 (+ 3 (+ 4 5)))))
+ function naturally accepts a list of numbers, just as much as
- and the other math operators tend to. But they didn't always start like that, they work only as binary operators; meaning they only take two at a time.
Somehow, Racket finds a way to compress our list of numbers into a series of binary arguments to a function. If our number list was a long piece of paper from a printer with every number printed on it, Racket has a way of "folding" this very long piece of paper into something that resembles a series of additions or subtractions. We call this folds, both in Racket as
For example, let's try creating the
sum function using
(define (sum listof-nums) (foldl + 0 listof-nums)) (sum '(1 2 3 4 5))
We supply an initial value to
foldl which acts as the first value in our folding. We can visualize how this will work by writing it out.
(foldl + 0 '(1 2 3 4 5)) (foldl + (+ 0 1) '(2 3 4 5)) (foldl + (+ 2 (+ 0 1)) '(3 4 5)) (foldl + (+ 3 (+ 2 (+ 0 1))) '(4 5)) (foldl + (+ 4 (+ 3 (+ 2 (+ 0 1)))) '(5)) (foldl + (+ 5 (+ 4 (+ 3 (+ 2 (+ 0 1))))) '()) (foldl + (+ 5 (+ 4 (+ 3 (+ 2 1)))) '()) (foldl + (+ 5 (+ 4 (+ 3 3)))) '()) (foldl + (+ 5 (+ 4 6)) '()) (foldl + (+ 5 10) '()) (foldl + 15 '()) > 15
We don't really want to use
foldr because it traverses the list in the opposite order. It starts from the last element of the list and builds up, which would use more space and is less efficient overall.
The big take-away is that
foldl can do iteration over lists for us and then "reduce" it into a data-type that makes sense for us. It doesn't have to be number, it can even be lists. We can even recreate
(define (my-map m items) (define (reducer item acc) (cons (m item) acc)) (reverse (foldl reducer '() items))) (define (my-filter f lst) (define (reducer item acc) (if (f item) (cons item acc) acc)) (reverse (foldl reducer '() items))))
foldl can re-create
filter, but if we start to toy with the code a little bit, we get something that would merge
filter into one brand new function:
#lang racket/base ; Take #3 (define (mapfilter m f items) (define (reducer item lst) (if (f item) (cons (m item) lst) lst)) (reverse (foldl reducer '() items))) ; test (module+ test (mapfilter add1 even? '(1 2 3 4 5)) ; output '(3 5)
Now that's a bit of an improvement. Far less gross recursion handled by myself and letting
foldl take the load off.
What we have just done is take the burden of manually extracting items from a list off our chest and place it on
foldl's shoulders instead. Our initial value is an empty list, and if our current item passes our
filter predicate test, then we can
map it and attach it to the head of the list.
It looks a lot cleaner and certainly isn't very hard to read. It's nice and simple and I'm pretty happy with it.
It's not a very large problem or maybe it's a little obvious if using a non-functional language, but I had fun investigating and figuring out how to better use my available Racket functions to clean up some what was originally very messy code. I have yet to really time the differences and impacts of each function so maybe there's overhead I'm missing somewhere, but this was a fun little dive to say the least.
As a side note, there is a function called
filter-map, but it does not resemble the functionality that I would need. 😥