verona Error handling, the next level C++

This is a long story, not a quick-fix issue. It's an overall design "document".

In #168 / #177 we introduced LLVM's Error and Expected patter for error handling. This works well in the MLIR generator, but we still have a few problems:

First, only the MLIR generator is using, while the Parser is using its own handler. This makes it harder to collate errors into a single hierarchical structure.

Second, the error messages are baked in (in English) inside the generating code. This makes it harder to display error in different settings (console, pop-ups, status bars).

Finally, the errors are final and can't be composed or returned as a list. This makes it harder to do partial compilation or to ignore errors in order to get as many as possible to show the user.

These three problems are important to get right early on, as it gets harder to do the more code we have. They're also an important part of our goal for the Verona compiler (partial compilation, helper tools, etc).

Asked Oct 08 '21 14:10
avatar rengolin
rengolin

4 Answer:

To fix the first problem, we just need to take a decision on which error handling we'll use for the project as a whole, and then propagate that to all interacting sub-projects. For now, that's mostly the AST parser and MLIR generator. But we need to take a decision that makes sense for both projects, and all future integrations later (driver, LLVM handler).

1
Answered Sep 08 '20 at 11:31
avatar  of rengolin
rengolin

To fix the second problem, the common and simple approach is to make ParsingError and RuntimeError base classes for derived behaviour, with constructors that don't propagate the meaning as text, but as which derived class we use. The front-ends then can choose how to report them, using the internal structure to populate the error messages.

For example:

IncompatibleType : public ParsingError {
  AST type1, type2;
  ...
};
llvm::make_error<IncompatibleType>(type1, type2, loc);

The front-end would receive that object and use type1, type2 and loc directly instead of printing an already rendered string.

We'd probably need some object-to-text conversion that returns some UTF-8 string, but since those will match the source code, we don't need to worry about internationalisation in the error objects themselves.

1
Answered Sep 08 '20 at 11:36
avatar  of rengolin
rengolin

The third problem is harder. We'll need to change the whole structure of the code.

The parser should be simple, as it can "skip tokens" until some closing token is found (ex. }) to continue.

The MLIR generator, however, assumes that the AST is well formed, so any error in nested blocks will likely be type errors, or worse, transform errors, and will need to "skip the block", which may not be semantically valid to do at that late point (or easy to report back, with source code location).

The easy way out is to treat functions/classes as isolated constructs and only take one error per top-level construct. But a better approach would be to only add multiple error handling to the areas that make sense (loops, conditionals, lexical blocks). This will add complexity to the generator, so if we can think of an automated way of doing that (some form of dispatcher), that'd make things a lot easier.

1
Answered Sep 08 '20 at 11:42
avatar  of rengolin
rengolin

We're writing a new parser which will probably change a lot in the current MLIR layer. While these points are still very relevant, the final solution might be completely different. Closing this for now. We can open a new one for the new pipeline when we get there.

1
Answered Feb 26 '21 at 13:29
avatar  of rengolin
rengolin