Refactors Codegen to extract the core "visit IR as a tree" logic separately from
the code to emit JS:
* `HIRTreeVisitor` is a new helper that visits the HIR as a tree. You call
`visitTree(ir, yourVisitor)` and it drives visiting of the IR, tracking blocks
and scopes and calling methods as appropriate.
* `Codegen` is now implemented as a Visitor implementation. For example
`enterBlock()` creates an empty `Array<t.Statement>`, `leaveBlock()` wraps that
in a `t.BlockStatement`, etc.
* `printHIRTree()` is a new IR printer (implemented as a visitor) that prints
the HIR in tree form, so it retains the original shape of the code but with each
block replaced with its IR equivalent.
The new pretty printed scopes "syntax" breaks mermaid labels because the `@`
character seems to be reserved. This wraps them all as a string so they work
again.
Expands InferReactiveScopeVariables to update the mutableRange of all
identifiers to be the range of its scope. The result is that all identifiers in
a given scope will have the same range, whose start is the minimum of the
identifiers range starts, and end is the maximum.
This completes the implementation of InferReactiveScopeVariables, adding support
for phi nodes. Example:
```javascript
let x$0 = null;
mutate(x$0);
if (cond) {
x$1 = a;
mutate(x$1)
} else {
x$2 = b;
}
x$3 = phi(x$1, x$2);
mutate(x$3);
```
We now add x$1, x$2, and x$3 to the same reactive scope. This reflects the fact
that x$3 cannot be computed without also computing both x$1 and x$2. Note that
x$3 can never be x$0, so x$0 is _not_ added to the same scope. This allows us to
take advantage of SSA form to note that _some_ instances of an identifier really
are distinct.
There's no option to output the results of the test262 harness in silent mode so
every pass and failure outputs multiple lines to stdout. Since there are many
thousands of tests this results in unusable log files that are over 200k lines
long. This PR redirects stdout to a tmp file and then we reformat the result
into a small JSON object, grouped by the failure message with count.
Example:
```json [ { "pass": false, "data": { "message": "Expected no
error, got Error: TODO: Support complex object assignment", "count": 66
} }, { "pass": false, "data": { "message": "Expected no
error, got Error: TODO: lowerExpression(FunctionExpression)", "count": 4
} }, { "pass": false, "data": { "message": "Expected no
error, got Error: TODO: lowerExpression(UnaryExpression)", "count": 6
} }, { "pass": false, "data": { "message": "Expected no error,
got Error: todo: lower initializer in ForStatement", "count": 28 }
}, { "pass": false, "data": { "message": "Expected no error, got
Invariant Violation: Expected value for identifier `15` to be initialized.",
"count": 14 } }, { "pass": false, "data": { "message":
"Expected no error, got Invariant Violation: `var` declarations are not
supported, use let or const", "count": 76 } }, { "pass": true,
"data": { "message": null, "count": 1 } } ] ```
Add a new workflow to run the test262 tests on commits to main but not in pull
requests. This is to keep this test non-blocking on PRs but lets us track pass
rates over time
Adds a new pass `InferReactiveScopeVariables` which determines the sets of
variables (by Identifier) which "construct together" and belong in the same
reactive scope. Concretely, `Identifier` gets a new property `scope: ScopeId`,
and this pass assigns each identifier a ScopeId value. The algorithm iterates
over all instructions in all blocks (in a single pass) and builds up disjoint
sets of identifiers that appear as mutable operands in the same instruction.
The algorithm is relatively simple (especially since I had already implemented a
union-find data structure): however looking at some examples reinforced that
other planned todos around alias analysis are really important. We also have to
think more about what "mutable lifetime" means in the context of SSA: currently
variables that are reassigned (but never "mutated", eg bc they're assigned a
value type) never appear as mutable.
Just realized we can run all tests without encountering the arg limit if a
string is passed in.
This is much better because the test runner will count all tests in the parent
test directory rather than run the tests in each subdirectory
Noticed this while running test262 tests that many variables were throwing an
invariant for being undefined. This includes things like the special `arguments`
object, a global `assert` function used by test262, etc.
- Adds a shallow git submodule for test262 as the tests aren't available as an
npm module - To run all tests: `yarn test262:all`. Note that this chunks up the
tests by test262 folder as there are over 50k+ tests and the test harness only
accepts arrays of filepaths which exceeds arg limits - To run a specific test:
`yarn test262 test262/test/folder/file.js`. You can also pass globs which
expand into an array of filepaths: `yarn test262 test262/test/folder/**/*.js` -
More instructions for the test-harness can be found here:
https://github.com/bterlson/test262-harness
I noticed on @kassens's #771 that despite running LeaveSSA there are still cases
where we still reassign to a unique identifier: functions that have reassignment
but no phi nodes, such as:
```javascript
function foo() {
let x$1 = 0;
x$2 = x$1 + 1;
}
```
Here SSA form rewrote the second statement's LHS, but bc there's no phi node we
can't recover what the original was supposed to be (`x = x + 1`). This was my
oversight when suggesting the simpler LeaveSSA algorithm, it works for
eliminating phis but not other reassignments. The only alternative to removing
SSA form is to add assignment statements, which we obviously don't want to do
since that generates bloat.
This PR addresses the issue by adding an additional, optional property to
`Identifier` called `preSsaId` that starts off null. When entering SSA we save
the original id in this property and update id to a new SSA value. LeaveSSA does
the inverse, setting id = preSsaId and nulling out the latter. This means that
an identifier can always be uniquely identified by its `id` value at any point
in the compiler, while it's trivial to correctly undo SSA form.
```typescript
type Identifier = {
// Unique value for each original identifier
id: IdentifierId;
// The original, un-mangled variable name if this was a variable present in the
source (null if it's generated)
name: string | null;
// When in SSA mode, this is set to the original, pre-SSA `id` value
preSsaId: IdentifierId | null;
}
```
Implements assignment expressions with operators other than `=` (such as `+=`)
by lowering to an assignment.
I think this isn't fully correct for something like `a.b.c += 1`, but it seems
like there's more gaps in object accesses.
- Get rid of `indent` as it was making the code hard to read - Remove
unnecessary 2nd iteration over blocks - Remove extra newline between the bb
subgraphs and the jumps section - Remove trailing spaces - Remove newlines
between each subgraph and jump
We might need to revisit the tabs vs. options split, but for now this just adds
a checkbox toggle that outputs codegenned JS instead of HIR in the HIR tab. Open
to ideas to organize this in the future...
This PR adds a new section to fixture tests, which renders the HIR into
a visualization using mermaid.js syntax which can then be embedded
directly into markdown.
The nice thing about the mermaid syntax is that it's quite readable, so
if desired we could replace the current basic block textual output with
the mermaid block. I'm opting to append it for now and wait for feedback
if we want to keep both or replace.
To view the graphs in your editor, download an extension that
adds mermaid.js support:
https://mermaid-js.github.io/mermaid/#/integrations?id=editor-plugins. In vscode
you can use this plugin by right clicking on "Open Preview" on any expect.md
file. No extra dependencies are required for GitHub which should have builtin
support for mermaid in markdown
Instead of hacking into the babel plugin passes, this now takes a new approach
for the HIR tab:
- The different passes are exported from the babel plugin (for simplicity)
- The tab actually runs the compiler steps based on local config in the tab.
- Re-purposed the CompilerFlags component to configure what passes to run. This
is currently mostly causing different errors, but could be useful going forward
as a direction.
The `--watch` argument forwarding wasn't working anymore because `yarn build`
doesn't have the `tsc` command at the end anymore. There's maybe something that
can be done to forward the watch argument, just call `tsc --watch` directly.
Small adjustment to the previous PR for a special case:
```javascript
while (cond) {
break;
}
```
The loop body is an indirection to the fallthrough, so shrink() collapses that
and makes the while.loop === while.fallthrough. We now detect that this is the
case in codegen and correctly emit a `break` rather than trying to write the
fallthrough block inside the loop.
Adds a new 'while' terminal variant, which will be a model for other loop
terminals, and adds support for the entire compilation pipeline through codegen.
To understand the structure of the terminal consider this input:
```javascript
let x = 0;
while (x) {
x = foo(x);
}
return x;
```
We currently lower this to ifs and gotos:
```
bb0: precursor to loop
let x = 0;
goto(break) bb1; // <-- **The new terminal replaces this**
bb1: test block, whether to (re-)enter the loop
if (x) consequent=bb2 alternate=bb3;
bb2: loop body
x = foo(x);
goto(continue) bb1;
bb3: fallthrough after the loop
return x
```
This representation correctly models the semantics of while statements, but
loses the high-level information that there was a loop. The new 'while' terminal
replaces the first 'goto(break) bb1'. Conceptually, the 'while' terminal means
"enter the starting point of a while loop". In this example the terminal would
look like this:
```
{
kind: 'while',
testBlock: 'bb1', // the basic block that checks whether to enter the loop or
not
loop: 'bb2', // the block containing the loop body
fallthrough: 'bb3' // the block that goes after the loop
}
```
Most passes will only look at 'testBlock', ie they will treat this terminal as a
simple goto:testBlock. However, codegen uses the full information in the
terminal to reconstruct the loop. My previous PR, #755, added a mechanism to be
smart about when to emit or not emit `break` statements; this PR improves upon
that to accurately emit the minimal break and continue statements: ie omitting
entirely where they are extraneous, emitting unlabeled break/continue when
sufficient, and falling back to labeled break/continue only where strictly
necessary. The logic is very much analogous to IR construction.
Alternative approach to #750. We now store the original identifier on the Phi
node, then rewrite every BasicBlock's identifiers to reference the original id
instead of the SSA'd id.
This solves the shadowing problem and also lets us omit adding copies of
instructions.
The approach is very similar to what BuildHIR does to resolve break and continue
targets during IR construction:
* We annotate goto targets as either a break or a continue (during HIR
construction). This is necessary to reconstruct the right kind in codegen.
* Codegen continues to work by traversing the IR as if it were a tree, relying
on the `fallthrough` branches of if/switch to be able to visit the
consequent/alternate recursively and then emit the fallthrough branch.
* We track a Set of blocks that are scheduled to be emitted by some parent in
the tree. Nested ifs may all have the same fallthrough branch, which we only
want to emit once. This set helps us to know that a parent is already going to
emit some block, such that children can skip it.
* We also keep a stack of break targets that are in scope, and use this to
convert gotos appropriately, as either a break, continue, or nothing at all (for
example a switch case that falls through has no explicit syntax to model this
fall-through, the only option is to emit nothing for the goto).
* Then, if/switch have to carefully check whether each branch should be emitted
or not. For example, if the alternate is already scheduled to be emitted (by a
parent), then we emit a block with a break statement instead.
* Switch in particular is tricky, because we need to know that subsequent cases
are scheduled, but only for preceding blocks. So we visit the cases in reverse
order (not surprisingly, we do the same thing during IR construction for similar
reasons!).
The bookkeeping is a bit finicky but this works reliably. There are some cases
where we could try to emit an unlabeled break instead of a labeled break, or
avoid emitting a label at all (if nothing will explicitly break to that label),
but overall the generated code is readable enough that i'm inclined to ship and
iterate. I'm open to feedback though, as always!
Reverts #726 which added an early optimization to the SSAify pass in skipping
over phi creation if only one unique operand. This is no longer necessary with
the addition of a phi elimination pass added in #739.
This is an alternate take on phi elimination to the one we pursued over VC w
@poteto driving. This version exploits the RPO ordering of blocks to do phi
elimination in a single pass when there are no loops, and to minimize repeated
visits when there are loops. The main difference is when redundant phis are
removed. Rather than eagerly walking through the CFG for each pruned phi to
rewrite its uses, we build up a mapping of rewritten identifiers. As we walk
through subsequent instructions, we rewrite each place based on that mapping. We
continue cycling through the blocks so long as a given iteration *both* added
new rewrites (meaning there may be subsequent uses to rewrite) *and* there are
back-edges. With no loops this results in a single visit of each block and of
each instruction, but even with loops this is bounded.
This diff adds styling to the compiler options editor. The floating input/output
toggle button on small screens now spans the bottom of the screen, so that it
doesn't block the compiler options.
Height overflows when adjusting screen size are complicated by Monaco Editor and
will be addressed in a later diff.
Test plan:
Start Playground and see the latest look of the compiler options editor beneath
the output section.