Interpreter Exception Handling Implementation, Part 1

Recently, I read the book crafting interpreters by Bob Nystrom (@munificent) and like many other interpreter tutorials it stops short of implementing exception handling, citing complexity. Given that the book already implements its target language twice, and after implementing it, I will break this into three posts, I can agree that it would make the book a bit too long. I mean, the Dragon Book doesn’t even cover it. However, there’s nothing stopping me from writing these posts, so I’m gonna do that.

I am assuming you’ve read the book and have an otherwise vanilla copy of the C implementation of the ‘lox’ language. I’ve used a clone from the book’s repo, which includes book markup in the code, and might yet be further updated. The code snippets are patches, which in theory will apply cleanly if so desired, but I recommend doing it The Hard Way. I’m keeping with the book’s tenet of including every line of code, so these posts will be code-heavy.

disclaimer:
This should not be considered a reference implementation, it’s the first thing I came up with after reading the Python interpreter’s source code and discovering that their approach wouldn’t fit the clox implementation directly, but clearly inspires the result. It therefore might not be super efficient or even a good way of doing it, but it is a way that works.

When I talk about exception handling, I intend to mimic the behaviour of most other languages that support such a feature (C++ being the notable exception). So let’s start with the throw keyword – it should signify that the result of the following expression is to be thrown like an exception.

--- a/c/scanner.h
+++ b/c/scanner.h
@@ -23,7 +23,7 @@ typedef enum {
   TOKEN_AND, TOKEN_CLASS, TOKEN_ELSE, TOKEN_FALSE,
   TOKEN_FOR, TOKEN_FUN, TOKEN_IF, TOKEN_NIL, TOKEN_OR,
   TOKEN_PRINT, TOKEN_RETURN, TOKEN_SUPER, TOKEN_THIS,
-  TOKEN_TRUE, TOKEN_VAR, TOKEN_WHILE,
+  TOKEN_THROW, TOKEN_TRUE, TOKEN_VAR, TOKEN_WHILE,
 
   TOKEN_ERROR,
   TOKEN_EOF
--- a/c/scanner.c
+++ b/c/scanner.c
@@ -158,7 +158,18 @@ static TokenType identifierType() {
     case 't':
       if (scanner.current - scanner.start > 1) {
         switch (scanner.start[1]) {
-          case 'h': return checkKeyword(2, 2, "is", TOKEN_THIS);
+          case 'h': {
+            if (scanner.current - scanner.start > 2)
+            {
+              switch (scanner.start[2]) {
+                case 'i':
+                  return checkKeyword(3, 1, "s", TOKEN_THIS);
+                case 'r':
+                  return checkKeyword(3, 2, "ow", TOKEN_THROW);
+              }
+            }
+            break;
+          }
           case 'r': return checkKeyword(2, 2, "ue", TOKEN_TRUE);
         }
       }
--- a/c/compiler.c
+++ b/c/compiler.c
@@ -980,6 +980,7 @@ ParseRule rules[] = {
 /* Compiling Expressions rules < Types of Values table-true
   [TOKEN_TRUE]          = {NULL,     NULL,   PREC_NONE},
 */
+  [TOKEN_THROW]         = {NULL,     NULL,   PREC_NONE},
 //> Types of Values table-true
   [TOKEN_TRUE]          = {literal,  NULL,   PREC_NONE},
 //< Types of Values table-true

That covers the normal process of adding a new token – adding the token type, scanner support and ensuring that the Pratt Parser table is updated.

Now that we know about the token, we should handle it in the compiler. This also means a new VM operation which we will output to do the actual throwing.

--- a/c/chunk.h
+++ b/c/chunk.h
@@ -98,8 +98,9 @@ typedef enum {
   OP_INHERIT,
 //< Superclasses inherit-op
 //> Methods and Initializers method-op
-  OP_METHOD
+  OP_METHOD,
 //< Methods and Initializers method-op
+  OP_THROW,
 } OpCode;
 //< op-enum
 //> chunk-struct
--- a/c/compiler.c
+++ b/c/compiler.c
@@ -1381,6 +1382,13 @@ static void whileStatement() {
   emitByte(OP_POP);
 }
 //< Jumping Back and Forth while-statement
+
+static void throwStatement() {
+  expression();
+  consume(TOKEN_SEMICOLON, "Expect ';' after value.");
+  emitByte(OP_THROW);
+}
+
 //> Global Variables synchronize
 static void synchronize() {
   parser.panicMode = false;
@@ -1397,6 +1405,7 @@ static void synchronize() {
       case TOKEN_WHILE:
       case TOKEN_PRINT:
       case TOKEN_RETURN:
+      case TOKEN_THROW:
         return;
 
       default:
@@ -1466,6 +1475,8 @@ static void statement() {
     block();
     endScope();
 //< Local Variables parse-block
+  } else if (match(TOKEN_THROW)) {
+    throwStatement();
 //> parse-expressions-statement
   } else {
     expressionStatement();

It’s pretty straight-forward. Find a throw and output an OP_THROW once the expression forming the argument is complete and the result is on top of the stack. Of course the lox language finishes statements with semi-colons, so expect one of those, too. The throw statement is also a convenient place to synchronize, so we’ll go ahead and put it there, too.

Once we have the instruction output to the vm, it’d be pretty useful to see what’s happening in the debug output, so let’s add that.

--- a/c/debug.c
+++ b/c/debug.c
@@ -220,6 +220,8 @@ int disassembleInstruction(Chunk* chunk, int offset) {
     case OP_METHOD:
       return constantInstruction("OP_METHOD", chunk, offset);
 //< Methods and Initializers disassemble-method
+    case OP_THROW:
+      return simpleInstruction("OP_THROW", offset);
     default:
       printf("Unknown opcode %d\n", instruction);
       return offset + 1;

We have to deal with the actual functionality of that in the virtual machine. I’ve added two functions to do this, which each use a little foresight from the complete implementation. Getting the stacktrace from the exact place the exception is thrown is pretty important for tracing the exceptions, so we’ll do that as a separate task that doesn’t alter the call stack. Lox doesn’t have arrays (or lists) so we’re returning a newline-separated string. It’s pretty heavy on allocations, so don’t expect to be able to throw an out-of-memory exception and have the interpreter behave sensibly. Of course the functions you can safely call in such a scenario are pretty limited, and printf is not on that list. Allowing the program to crash in that scenario is a reasonable approach.

--- a/c/vm.c
+++ b/c/vm.c
@@ -165,6 +165,42 @@ static Value peek(int distance) {
   return vm.stackTop[-1 - distance];
 }
 //< Types of Values peek
+
+static Value getStackTrace(void)
+{
+#define MAX_LINE_LENGTH 512
+  int maxStackTraceLength = vm.frameCount * MAX_LINE_LENGTH;
+  char *stacktrace = ALLOCATE(char, maxStackTraceLength);
+  uint16_t index = 0;
+  for (int i = vm.frameCount - 1; i >= 0; i--)
+  {
+      CallFrame *frame = &vm.frames[i];
+      ObjFunction *function = frame->closure->function;
+      // -1 because the IP is sitting on the next instruction to be executed.
+      size_t instruction = frame->ip - function->chunk.code - 1;
+      uint32_t lineno = function->chunk.lines[instruction];
+      index += snprintf(
+          &stacktrace[index],
+          MAX_LINE_LENGTH,
+          "[line %d] in %s()\n",
+          lineno,
+          function->name == NULL ? "script" : function->name->chars);
+  }
+  stacktrace = GROW_ARRAY(char, stacktrace, maxStackTraceLength, index);
+  return OBJ_VAL(takeString(stacktrace, index));
+#undef MAX_LINE_LENGTH
+}
+
+void propagateException(void) {
+  ObjInstance *exception = AS_INSTANCE(peek(0));
+  fprintf(stderr, "Unhandled %s\n", exception->klass->name->chars);
+  Value stacktrace;
+  if (tableGet(&exception->fields, copyString("stacktrace", 10), &stacktrace)) {
+    fprintf(stderr, "%s", AS_CSTRING(stacktrace));
+    fflush(stderr);
+  }
+}
+
 /* Calls and Functions call < Closures call-signature
 static bool call(ObjFunction* function, int argCount) {
 */

The getStackTrace function walks the call stack and traces the path to how we got where we are, noting the function names and line numbers as we go. We’ve made a concession that a line will never be more than 512 characters long, which is probably long enough. We do resize the string to the exact space used at the end of the function, so at least we aren’t wasting space eventually.

propagateException is pretty simple at this stage. It tells the user which type was thrown and attempts to get a field called stacktrace that is a string, and prints that out. In practice, because this is a dynamic language, this means we can throw any object type and we’ll set that during the OP_THROW handling. I haven’t included guards for attempting to throw a value-type like a bool or number.

--- a/c/vm.c
+++ b/c/vm.c
@@ -884,6 +920,13 @@ static InterpretResult run() {
         defineMethod(READ_STRING());
         break;
 //< Methods and Initializers interpret-method
+      case OP_THROW: {
+        Value stacktrace = getStackTrace();
+        ObjInstance* instance = AS_INSTANCE(peek(0));
+        tableSet(&instance->fields, copyString("stacktrace", 10), stacktrace);
+        propagateException();
+        return INTERPRET_RUNTIME_ERROR;
+      }
     }
   }

And that’s throwing exceptions. Because we can’t catch exceptions, yet, we always return an error once we encounter an instruction to throw an exception.

While this doesn’t introduce full exception handling, this on its own is still useful. We can now exit lox programs early with an error condition if we detect something has gone wrong, giving an insight to the user as to how we got where the problem occurred.

Futher challenges:

  • How could we minimise the amount of memory allocated for the stacktrace string? What are the trade-offs imposed?
  • What should we do in the case that a user attempts to throw a value-type?

Stay tuned for part 2 and part 3.

Advertisement