Pickle Tokenizer
Modified By dragoncoder047
This post is part 3 of the pickle series:
- Pickles!
- Manual Memory Management Madness
- Pickle Tokenizer
- Yet Another Garbage Collector
- Powerful PICKLE Pattern Matching
- PICKLE Has Regular Expressions, Apparently
- It's September!!
- Continuations and the thunk queue
- The Lesser of Two Evils
- A Hash-Mapped Mess
I’m starting to work on my Pickle programming language, this time in Javascript. After only a few days’ work, I’m surprised I got so much working. Currently I have both the tokenizer and the inheritance system working. The syntax of Pickle is pretty much in place now, and I just have a few tweaks left for the tokenizer, and hooking it up to a parser, before I am able to write the evaluator.
Tokenizer
In contrast to my previous attempt in C, I actually wrote a tokenizer. But in all honesty, the tokenizer really does 90% of the parsing – it just doesn’t recursively build up a tree of expressions using the parens (that will be done by the parser).
I got the tokenizer working in a little demo page I dubbed the “seeder” for some reason. Currently it is able to tokenize this code:
defun fib(x): if $x == 0 or $x == 1: return 1 else: return (fib $x - 1) + (fib $x - 2) print (fib 10)
into this stream of tokens:
[EDIT Apr. 21: The above is now a live editor using the tokenizer as it was when I wrote this post – write away and the tokens will update automatically. The tokenization may have changed since then.]
The tokenizer is also a bit unique in that it can recover from a syntax error and keep scanning, allowing you to see and fix multiple syntax errors all at once. And it’s also nice that the Ace.js code editor allows you to place annotation markers in the gutter, which is what I did in the “seeder”.
The only bug I can see here with this example is that the colon-block string part consumes the newline at the end of the block, so the print
is considered to be on the same logical line, but it shouldn’t be. A simple ;
before the print
would fix that, but I feel that this kind of indented block structure where an unindent ends both the block and the line, would be more common than having the string continue the line – in other words, having the default be to continue it and adding punctuation to end it would result in more “punctuation overload” versus having a special punctuation character mean continue the line and the default be to end it. Unfortunately, the former appears to be whet is implemented in the Javascript tokenizer.
Inheritance
In my earlier post about Pickle, I mentioned that Pickle would have a multiprototype-based inheritance system, a strange mix of Python and Javascript. Python supports multiple inheritance, but chokes on “ambiguous” inheritance trees, while Javascript only supports single inheritance through prototypes. But I think I’ve found a simple solution that implements multiprototype-based inheritance. Here’s a pared-down example:
class PickleObject {
constructor(name, ...prototypes) {
this.name = name;
this.prototypes = prototypes;
}
toJSON() {
return this.name;
}
getMRO() {
var fun = x => [x].concat(x.prototypes.map(fun));
return fun(this).flat(Infinity);
}
}
var A = new PickleObject("A");
var B = new PickleObject("B");
var X = new PickleObject("X", A, B);
var Y = new PickleObject("Y", B, A);
var Crash = new PickleObject("Crash", X, Y);
alert(JSON.stringify(Crash.getMRO()));
// -> ["Crash","X","A","B","Y","B","A"]
This is exactly the same code that I posted earlier that Python can’t handle – but here, Crash.getMRO()
simply returns a flat array that can be searched linearly. I’m not sure how fast this is, but I do have some optimization tricks that I could apply.
What’s next?
I don’t know exactly what, but Pickle is still only half-written. After I write the parser, I’ll need to then write the evaluator. And the evaluator is going to be extremely complicated and probably very slow, although I do hope it will be somewhat readable due to Javascript’s built-in functional programming constructs that C doesn’t have natively.
Pickle does look like it’s going to be simpler than Phoo, certainly. Although Phoo did get complicated because I split everything into a zillion different files. One huge file for everything may be a bit much, but having a bazillion files and none have any more than 100 lines apiece is also a bit much. Aside from the weird operator semantics, I do hope Pickle’s flow will be easier to follow.
Related Posts
- Continuations and the thunk queue
- Pointer Soup
- A Hash-Mapped Mess
- The Lesser of Two Evils
- The God Language