There are a bunch of ways we can go about converting from the input HIR into a
final form that has the preamble inserted and memoized blocks of code wrapped
with change detection and caching. This is just one way, it might not be the
ideal way. In any case, this pass converts HIRFunction -> ReactiveFunction. The
latter is a recursive (tree-shaped) data structure that attempts to represent
blocks each of composed of scopes or instructions, where scopes are themselves
composed of blocks etc. The idea is a) this makes it easy to visualize the
structure and check that the scopes and their dependencies are correct and b)
this is a really nice form for adding the memoization code. We can convert from
a ReactiveFunction back to an HIRFunction, wrapping each scope in the
appropriate if checks and caching.
## Example
Consider the following example, which has 2 main scopes: an outer one for `x`
and an inner one in the consequent for `y`:
```javascript
function foo(a, b, c) {
const x = [];
if (a) {
const y = [];
y.push(b);
x.push(<div>{y}</div>);
} else {
x.push(c);
}
return x;
}
```
## Output
The new builder constructs a ReactiveFunction for this example along the lines
of the following (note that inputs are always empty bc we don't collect those
yet):
```
{
scope @0 [1:11] inputs=[] {
[1] Const mutate x$11_@0[1:11] = Array []
[2] if (read a$8) {
scope @1 [3:5] inputs=[] {
[3] Const mutate y$12_@1[3:5] = Array []
[4] Call mutate y$12_@1.push(read b$9)
}
scope @2 [5:6] inputs=[] {
[5] Const mutate $13_@2 = "div"
}
scope @3 [6:7] inputs=[] {
[6] Const mutate $14_@3 = JSX <read $13_@2>{freeze y$12_@1}</read $13_@2>
}
[7] Call mutate x$11_@0.push(read $14_@3)
} else {
[9] Call mutate x$11_@0.push(read c$10)
}
}
[10] return x$11;
}
```
This shows the hierarchy: there's an outer scope, `@0` to compute `x` (the first
scope), then within the if consequent there's another scope, `@1`, to compute
`y`. We have some technically extraneous scopes to compute the JSX element; that
can be cleaned up with a bit more refinement.
With this structure — and the inputs and outputs of each scope filled in — we
can convert to code in a straightforward manner. Each scope turns into a block
along the lines of the following:
(note here we use strings to index the cache, in reality these would be ints)
```javascript
// one change variable pet input:
let c_a = a !== $['a'];
...
// one variable for each output:
let x;
...
// if (changed) { recompute } else { use-cache }
if (c_a || ... ) {
x = ...;
// one assignment per output
$['x'] = x;
// update cache per input
$['a'] = a;
...
} else {
// one assignment per output
x = $['x'];
...
}
```
TreeVisitor didn't distinguish between the type of a block and the type of an
item that can occur within a block - this was fine for Codegen which can use
`t.Statement` for both of those values. However, the upcoming scope construction
needs to distinguish instructions in a block from a block itself, so this PR
adds a new type parameter.
Per the title, this PR adds support for assignment expressions in update
clauses. This was mostly fixed by the previous diff to improve value block
handling, and there's only a bit more to do here to allow a "value block" that
doesn't produce a value (we need a better name).
This is a pre-req to construct reactive scopes in #857. "Value blocks" such as
`for` init/test/update and `while` test need to be consistently wrapped in
enter/leave calls so that we can extend the range of values properly. We also
need to handle `for` init blocks a bit differently, since they allow variable
declarations but not other types of statements.
This PR ensures that we use consistent methods for handling value blocks (`for`
test/update and `while` test) and treats `for` init as a new type with its own
enter/append/leave visitor functions.
Previously, this step just set the mutable range of any alias set including any
mutation to the end of the last mutable range of any of the containing
identifiers.
This change makes it so that the ends are only updated of the ranges that end
before the last mutation.
Fixes#852
Fixes https://github.com/facebook/react-forget/pull/858#discussion_r1044626137.
The case is
```javascript
function foo() {
let x$1 = 1;
let y = 2;
if (y === 2) {
x$2 = 3;
}
x$3 = phi(x$1, x$2)
if (y === 3) {
x4 = 5;
}
x$5 = phi(x$3, x$4);
y = x$5;
}
```
What happens here is that there are two _sequential_ phis for `x`. Previously
when we encountered the second phi we would find that there is no `let`
declaration for the phi or any of its operands, and create a new one before the
second `if`. That's incorrect, these should all merge into a single `x`
declaration. We now look up the phi operands to see if they are part of a
previous phi, and merge them correctly.
Reorders LeaveSSA so that it runs before we begin evaluating reactive scopes.
Note that reactive scopes must span the full construction of each variable — for
variables with a phi, this must span the declaration and all assignments of the
phi operands. And that's exactly what the new LeaveSSA does! LeaveSSA removes
phi nodes and ensure that all versions of a variable which flow into a phi have
been assigned a single canonical identifier (with an appropriate mutable range).
This PR includes this and some related changes:
* Reorders the pass
* Changes hir-test to print the final HIR, eg just prior to codegen
* Teaches LeaveSSA to update the mutable range of the canonical identifiers it
assigns, based on the min/max of the variables assigned.
A wise Joe once said, "We only care about observed aliasing".
Instead of performing aliasing for fields and non fields together, split the
analysis to happen over separate passes. Similarly split inferring mutable
lifetimes pass for fields and non fields.
Now, we can identify the aliases of fields that are *not* mutated and only alias
the ones that do mutate.
The algorithm is roughly as follows:
1. Build the set of aliases for non fields
2. Infer mutable ranges for all instructions except aliasing fields
3. Infer mutable ranges for all aliased instructions based on the alias set
calculated in step 1 (this doesn't include aliasing fields)
4. Extend the set of aliases (calculated in step 1) to include fields only if
the field or the receiver is mutated after the aliasing, ie, if _mutability is
observed_.
5. Run infer mutable ranges again for all instructions including the fields that
were aliased in the previous step.
6. Run infer mutable ranges for all aliased instructions including the fields
that were aliased.
## Problem
The previous version of LeaveSSA used a very simple approach in which
identifiers stored their pre-ssa id, and LeaveSSA restored this id back. The
upside of this approach is that it's very simple and trivially correct (assuming
no reordering of code). The downside is that after running LeaveSSA we lose all
information about which versions of variable declarations are distinct, and
which might merge together in a phi. That information is really useful for scope
analysis! Consider this input (variables are numbered as they would be in SSA
form):
```javascript
function foo(a, b, c) {
let x$1 = null;
if (a) {
x$2 = b;
} else {
x$3 = c;
}
x$4 = phi(x$2, x$3);
return x$4;
```
The current LeaveSSA assigns all 4 variables back to `x$1`, with a single let
declaration at `let x$1 = null`. However, from a reactive scopes perspective,
there are really just 2 versions of x: the initial x$1 (defined and never used)
and then x$2, x$3, and x$4, which have to be merged into a single scope because
they are part of a phi. In other words, we can't independently compute x$2, x$3,
or x$4 - if any of their inputs changes, we have to redo all the computation.
However, the existing structure makes it difficult to figure out the correct
starting point for this scope — there is no initial `let` declaration that we
can refer to.
Instead, we can represent the program as follows after LeaveSSA, and then use
this form for scope analysis:
```javascript
function foo(a, b, c) {
const x$1 = null; // NOTE: rewritten to const
let x$2; // synthesized declaration to allow later reassignment
if (a) {
x$2 = b;
} else {
x$2 = c;
}
return x$2;
```
Note that there are only 2 versions of x, and we have synthesized a variable
declaration for x$2 at the appropriate scope. Our scope analysis can then
determine that the range of x$2 is from the declaration to the end of the if.
## Approach
This pass does two main rewrites:
* For variables that do *not* appear as a phi id or operand, it rewrites the
declaration to be `const`. You can see this above for x$1.
* For variables that *do* appear as a phi or operand, it synthesizes a new `let`
binding at the appropriate scope (ie, in the appropriate block), and updates all
other operands from the phi to use the same id for the variable. You can see
this above for x$2, x$3, and x$4.
Note that the let binding is generated at the narrowest scope possible. In this
example, we generate distinct let bindings for the other if and else branches:
```javascript
function foo(a, b, c) {
let x = null;
if (a) {
// we generate a `let x$2` here
if (b) {
x = 0; // becomes x$2
} else {
x = 1; // becomes x$2
}
x // becomes x$2
} else {
// we generate a `let x$3` here
if (c) {
x = 2; // becomes x$3
} else {
x = 3; // becomes x$3
}
x; // becomes x$3
}
}
```
Because the different x values from the outer if/else can never join in a phi,
we can treat them as independent variables and (re)compute them independently.
The algorithm works by iterating in reverse-postorder, and looking ahead at
fallthrough blocks to find phi nodes that may need a let declaration (see above
example of where these are generated). It also tracks variables which _don't_
participate in a phi so that it can rewrite their declarations to `const`.
## TODO
This PR does *not* yet work for cases where there is unconditional assignment
within a `while` test condition. That would technically create a distinct
version of the variable that shadows the value for the loop, and you can't have
variable declarations in a while test condition.
That case already doesn't work, though, so i'm punting on it for now until we
figure out a bit more around
"value" blocks. We have some good options, like desugaring to a `for(;;)` and
manually implementing the while semantics in that case.
The changes here are pretty significant and there's a bunch more left:
- support for with any of `<init>`, `<test>` or `<update>` empty. - support for
with `<init>` as `Expression` instead of VariableDeclaration` node - support
assignment expressions in `<update>`, this seems like it might require further
new abstractions to allow something like a block to codegen into a single
expression.
The instructions of a while test node cannot just be pushed to the previous
block. This creates a new block for the test node and then during code gen
converts the statements pushed to the "value block" into expressions.
This adds a field sensitive, flow-insensitive, context-insensitive alias
analysis for lvalue aggregates.
In the future, InferMutableLifetimes can refine it's analysis using these field
sensitive alias sets.
Currently AbstractState.alias performs three operations: - Reading a value -
Storing the value - Updating aliasing (in the DisjointSet)
This commit makes AbstractState.alias only responsible for updating the alias
information. The rest of the operations are split into separate functions.
Moves the existing alias set building logic from inferMutableLifetimes to a
separate pass.
Additionally maintain abstract state to refine aliasing to not include
primitives.
HIRTreeVisitor previously passed a string label (for certain blocks). This
changes to pass the raw BlockId, and have codegen convert that to a string. I'm
not sure if we'll need this but it would be helpful for eg visiting the IR and
emitting a new IR, while mapping block ids forward. Even if we don't need that
it makes sense for Codegen to decide how to convert a block id into a label
(which has to obey the rules of an identifier, not the visitor's concern).
They're fairly related, but I figured it's worth keeping more examples.
- For the `while` example we need to codegen into a single expression. - For the
expression with contained assignment we need to either keep the SSA ids around
or re-create a similar expression during codegen.
The `automaticLayout` config option for Monaco causes it to remeasure itself,
and the parent container's height being longer than the screen causes it to
constantly grow infinitely. You can observe this bug by going to the playground,
expanding any tab, and watch the scrollbar grow as the editor quickly grows to
ridiculous heights and your laptop starts glowing red hot and̶͙̕ H̴͉͘e comes
t̵͙́o ̴̜̿de̷̼̚s̷̻̍ec̶̮͒rate all knowled̵̥̆ge
The argument was unused and a confusing boolean argument that's easy to mix up.
Suggesting to remove it until we see a need for it at which point we might want
to introduce an enum to make the argument more obvious.
This is a follow-up to merging ranges. I realized that we need to mark terminal
ids as visited _before_ processing the branches of that terminal (whereas before
we were marking terminal ids only _after_ processing the branches). That exposed
another bug where interleaving could fail to be detected (with an error,
thankfully) if one of the branches had completed already.
Our small test suite is already really good!
This demonstrates a situation we don't handle well today. The basic structure is
that you have some variable defined at the top level, then some control flow
like if/switch where _all_ branches reassign the variable, then some code after
that references the resulting phi node:
```javascript
let x1;
// ... mutate/read x1
if (cond) {
x2 = {};
} else {
x3 = {};
}
x4 = phi(x2, x3);
```
We currently group x3, x3, and x4 into a scope together, but note that...there's
no `let` declaration for any of those! This means that it looks like the scope
for x2 and x3 start in the consequent/alternate, but the true scope spans from
before-after the if. I'm inclined to say that LeaveSSA should run _before_ scope
analysis, and produce something like the following in this case:
```javascript
let x1;
// ...mutate/read x1
let x2; // new variable declaration for the new version of x
if (cond) {
x2 = {};
} else {
x2 = {};
}
x2;
```
This then allows us to construct a correct range for x2, which starts in the
other block.
- [x] Merge scopes that are interleaved
- [x] Merge scopes if they both cross control-flow boundaries together
- [x] Don't merge scopes that strictly shadow
Still WIP because I want to double-check and see if i can find a simpler
algorithm for this. But it works.