Making naive mistakes with Racket, and fixing it
Today I am going to go over how my lack of understanding of a component I have been using every day may have actually been leaking file descriptor handles and not closing them properly. What is a file descriptor handle? It's a way of knowing what files or network sockets are being accessed by what processes running on a system. Linux knows who's trying to access what, and will perform read/writes to those files/sockets on behalf of all the processes.
File descriptors, however, are not infinite, and when not taken into account, can cause program panics once your program was told it cannot create any more file descriptors.
The root cause of my pains with file descriptor handles is mainly from the use of
subprocess in Racket. The goal I had set in mind was the ability to navigate several hundred web pages quickly, rather than me manually clicking and navigating each page myself.
I reckon me automating opening web pages saves me several seconds per page, and I've used this program a bit over a long time, so I must have saved a lot of hours by now.
subprocess returns a
subprocess? object that I can interact with, mostly I just do
subprocess-wait, which blocks the thread until the
subprocess? has finished it's job. The job is that I ask my browser, Firefox, to open a web page as a new tab. You can do this by simply opening up your terminal and punching in:
$ firefox https://duckduckgo.com/ # ... # takes you to duckduckgo in firefox
In Racket terms, this is done by doing:
(define firefox (find-executable-path "firefox")) (define (ff-open url) (define-values (s i o e) (subprocess #f #f 'stdout firefox url)) (subprocess-wait s)) ; open pages (ff-open "https://joinmastodon.org/") (ff-open "https://duckduckgo.com/")
The case can be made for using
xdg-open instead of
xdg-open is based on a MIME-type query of the input argument, but
xdg-open tends to use Chromium a lot more, which I install irregularly when something doesn't work. I much prefer browsing in Firefox, so I use that instead, but you can choose what works for you.
define-values part of my code receives four variables: the
subprocess? object, and three text-based ports, one for input, another for output, and whatever we chose for the error port. These are standard process ports opened up by Linux. Error port is shared with output, as indicated by the use of
'stdout in the third argument.
However, this naive approach assumes Racket will memory-manage these objects for us when it goes out of lexical scope. And I can say with some certainty, that it does not.
I checked the official Racket docs for subprocessing and found the following quote:
Whoops! This thing I have ignored for so long is actually blowing up my file descriptors. I've been using
subprocess for so long I never even bothered to check file descriptors.
But before I share a fix, let's talk about what happens. So as I'm going about my day and opening up a metric shit ton of web pages, I notice a program crash. After about 500-something web pages later, Racket crashes and tells me it cannot open any more files and we've effectively hit our limit.
It stands to reason that the common file descriptor limit is 1024, but it can be increased manually if more is needed by a process (typically high-throughput network servers). But my computer is a mere desktop, and I know I never touched it, so I have reason to think that for every
subprocess I invoked, two file descriptors were created representing standard input and standard output.
To test this, I need to use a combination of tools to check my file descriptor handling count. File descriptors, or the
fd list, is found in the
/proc folder underneath the running process' PID folder, where we can see how many file descriptors we have opened. We can either use the file itself, or we can use
lsof to grab the number for us.
# launch in term #1 $ racket WebScript.rkt # now in term #2 $ ps aux | grep racket steve 12345 7.5 1.3 ... racket WebScript.rkt # that 12345 is the pid, use it with lsof $ lsof -a -p 12345 | wc -l 20 # 20 is how many FDs we have opened right now
After opening one web page, we have twenty file descriptors being used. We can see how many file descriptors get used opening one page at a time with my basic program (I wrote it to hit enter and open up another web page).
$ lsof -a -p 12345 | wc -l 20 $ lsof -a -p 12345 | wc -l 22 $ lsof -a -p 12345 | wc -l 24
Each web page opens up two, which makes sense because of input/outport ports. But they never close, just like the documentation said! So if we started with 20 FDs opened at one web page, it stands to reason that after 503 web pages, we will hit our cap of FDs, causing the program to crash!
The fix unfortunately isn't as interesting as the math to figuring out our file descriptor problem. Fortunately, being easy to fix is great for us coders, because we hate spending long amounts of time fixing issues!
Let's re-examine the code and picture exactly what went wrong.
(define (ff-open url) (define-values (s i o e) (subprocess #f #f 'stdout firefox url)) (subprocess-wait s))
What this function does is really what I covered before - tells Firefox or a browser to open a web URL. The function stores four values indicating text stream ports or the
subprocess? object, and then returns the result of
subprocess-wait, which is most likely
void? and doesn't matter.
The naivety comes from the idea that text ports are managed by the garbage collector, which does not seem to be the case in Racket. The docs note that ports are placed into the care of the custodian, and as such are probably immune to regular GC calls.
A custodian, if my theory is correct, is only shut down when the Racket process is completely done and over. GC calls to things like TCP streams and file ports don't make sense, and such are probably left to a custodian to clean it up if it is time for the program to shut down. Any dangling file descriptors that aren't cleaned up by the user are then cleaned up by the custodian.
In this scenario with my bulk web page opening program, the custodian is left untouched, and he never closes anything, because my program doesn't end, so he had no way of knowing what to do.
There are two ways of fixing this file descriptor leak:
close-input-port, like the docs say
Number one is worth mentioning to showcase how deep you can go with Racket coding. To tell the custodian to release and free a resource, the only function we can use, is called
custodian-shutdown-all. It's a cool function to do a total shut-down on a given custodian, but... it is probably more reckless than needed.
(define (ff-open url) (define-values (s i o e) (subprocess #f #f 'stdout firefox url)) (subprocess-wait s) (custodian-shutdown-all (current-custodian)))
Now the result is simply the returned value from
custodian-shutdown-all, which is still going to be
void?. I don't really get why, but I guess the custodian can never fail on releasing objects?
Anyways, let's not go with this solution. Why? Because if I was using any other custodian-based resources on the side of this program, they would also get shut down. The way around this is to create a new custodian and have the new values be assigned to the ... employee of the main custodian?
(define cust (make-custodian)) (define (ff-open url) (parameterize ([current-custodian cust]) (define-values (s i o e) (subprocess #f #f 'stdout firefox url)) (subprocess-wait s) (custodian-shutdown-all cust)))
Looks valid to me! Except, uhoh, we can't re-use a custodian that's been shut down, so we'd have to initialize a new one. Let's move the custodian inside the function then.
(define (ff-open url) (define cust (make-custodian)) (parameterize ([current-custodian cust]) (define-values (s i o e) (subprocess #f #f 'stdout *browser* url)) (subprocess-wait s) (custodian-shutdown-all cust)))
There, much better. In testing, it does actually close out all the file descriptors, so this does what we would need it to do.
Solution two is a bit easier and that's why I'll go with this one. It involves less manually managing the custodian and directly works with the ports available by closing them, freeing up the file descriptors. I guess you could say it's also less fun in that sense.
(define (ff-open url) (define-values (s i o e) (subprocess #f #f 'stdout *browser* url)) (subprocess-wait s) (close-output-port o) (close-input-port i))
Error port is ignored because there is no
close-error-port, and also because in the
subprocess call, we used a symbol
'stdout to tell
subprocess to use the same port for errors as the one for standard output.
Ah yes, that looks like the plain solution to me. Except, wait a minute, didn't this all start because our program was creating new text ports? Doesn't Racket get it's own ports for in/out/error? Is there no way to re-use those? We quite literally just asked Racket to re-use a port for the standard error stream.
Yes, and that leads to solution number three, which may or may not be a solution if you don't want text from other programs getting in the way of your text output from your regular Racket code. But, it may also prevent a file descriptor leak because we aren't creating new files.
(define (ff-open url) (define-values (s i o e) (subprocess (current-output-port) (current-input-port) 'stdout *browser* url)) (subprocess-wait s))
This final - but may not be what you need - solution will permit
subprocess programs to use the same text port as the parent Racket process for output and input, by simply using their parameter functions to grab them. Garbage text may however get in the way, so this one is not (in my eyes) the recommended one.
But it also doesn't leak! Because it's not creating new ones, this problem could have been avoided entirely if I re-used ports, which I ignored. But either way, I would rather not have program garbage text be placed into my standard output stream, so I will still go with solution number two.
Thanks for reading and hope you had some file descriptor fun with me.