Patrick the purple dragon

dragoncoder047’s blog

random thoughts about nonrandom things

Order Up

← Previous: The God Language Next: The Lesser of Two Evils
Posted
By dragoncoder047

I’ve been tinkering with David Johnson-Davies’ uLisp interpreter for a while now. It’s designed to be small first and foremost, so that the core part that fits on the rather wimpy ATmega328 (i.e. an Arduino Uno) will still achieve maximum usefulness. Most of the optimization is achieved by storing the built-in function lookup table in flash, instead of RAM, so that there is more RAM available for the program.

One of the core data types in Lisp is a stream. They are used to provide input and output to the program, and the typical microcontroller has no less than four of them, corresponding to the types of data buses that the microcontroller can have: UART, SPI, I2C, SD, etc., as well as the types builtin to Lisp itself, such as streams that read from or print to an in-memory string. David’s solution seemed kind of strange to me, as he defined a lookup table of the names of all built-in streams, and then each stream object in memory simply contained in index into that table. The strange part was that the actual function pointers to print to and read from each stream were not in an actual C++ table (which is probably what I would have done); David created lookup functions that when passed the index to the names table, returned the coresponding print or read function pointer.

That weirdness also makes it a little difficult for extension authors like myself to add new streams to uLisp. The current extension facility David added in version 4.4 only supports adding new functions, keywords, and special forms. Everything else, such as special global variables, reader macros, new types, and new streams, must be manually patched into the uLisp source file.

I started to think about how to re-implement the whole uLisp type system to add more flexibility. The first change was simple: David had again made the rather strange design choice to have built-in functions hold the index of their lookup table entry, rather than the actual memory address. This means when actually operating on up the lookup table entry’s data, an extra operation must be done (that of indexing the table to get the actual entry).

The second change was to completely re-implement streams. As well as making it easier to add new streams, I also wanted to remove a couple limitations of streams, such as the fact that file streams store the open file object in a global variable and not the actual stream object; the consequences of this is that you can only have two files open at once: one for reading, one for writing.

After trying some other solutions that went nowhere I eventually remembered one function that I came across when messing around with LIL – the OpenBSD libc function funopen(3). This allows you to create custom streams by specifying functions to read, write, seek, close the stream, along with an implementation-defined “cookie” object holding some data for the stream. The cookie was they key here, as it allowed the stream functions to be generic and operate on a class of streams, rather than one individual stream. This removes the restriction limiting uLisp to two files – neat.

After drafting up an in-memory layout for the stream object, I ran into a potential problem. Whenever uLisp’s garbage collector runs, it sweeps the Lisp cells from bottom-to-top, and reassembles the free-cell list from the unmarked cells, in reverse order, so each free cell is always at a lower memory address than the next. When cells are pulled off the free list and put into use, they are pulled off top-to-bottom as with a stack. The problem arises when I create the stream object – since it takes up multiple uLisp cells, some cells will be garbage-collected first. But if the stream hasn’t been closed before it is garbage-collected, it will need to be closed, but that can only be done if the “body” cells of the stream object are still intact. I came up with a extremely stupid solution at first: throw in another API function, that runs before each garbage collection, and simply mark all of the stream objects’ payloads (but not the heads), regardless of whether they are actually reachable, so that the head will always get swept first because it is the only cell that isn’t marked.

But this is probably not the best solution by any metric (except for stupidity). The garbage collector is already sweeping the workspace cells array once, and this would basically add a second sweep, doubling the time for a garbage collection. I haven’t investigated the invariants of the free-cell list’s order very closely, so it may be as simple as always allocating the “head” cell of the stream object last, so that it will be swept first. I have to ensure the order in which the cells are swept, because it could potentially break a stream and cause a memory leak if the stream gets corrupted before it gets closed. Hopefully, between myself and David, uLisp will be improved.


Related Posts