Previously, we would drop directives inside a component or hook but this is
problematic with reanimated which uses `'worklet'` to mark components from
compilation.
This PR adds a directive to HIRFunction and ReactiveFunction and codegens the
directive add the end. No processing is done on the directives themselves.
Babel seems to store the directives on a BlockStatement, rather than on the
Function but I've stored it on the Function types because we only support
compiling functions and the spec defines directives as occuring in the initial
statement list of a function: > A Directive Prologue is the longest sequence of
ExpressionStatements > occurring as the initial StatementListItems or
ModuleItems of a > FunctionBody, a ScriptBody, or a ModuleBody and where each >
ExpressionStatement in the sequence consists entirely of a > StringLiteral token
followed by a semicolon.
RFC: we can either retain break/continue target ids (instead of pruning them in
`buildReactiveFunction`) or re-implement the same logic in `propagateScopeDeps`
(ignore implicit breaks; match unlabeled break / continues to their closest loop
/ while parent terminal).
If we go ahead with this approach, I'll clean up this PR (add relevant types and
comments)
---
Reusing optionalMemberExpression nodes recently led to a bug when compiling
Forget playground.
```js
// the two a?.b's here should be different nodes!
if (a?.b !== $[0]) {
// ...
$[0] = a?.b;
}
```
Forget playground uses `babel-plugin-react-forget` and `next/babel`. Reusing the
same node in two positions in the AST lead to invalid mutations:
- the first `a?.b` is visited and transpiled to `a === void 0 ? ...`, which (1)
inserts nodes between the original node and its parent and (2) mutates `a?.b` in
place to a non-optional call
- the second `a?.b` in source gets updated to `a.b` and does not get visited
again
```js
// Source in `EditorImpl.tsx`
compilerOutput.kind === "err" ? compilerOutput.error.details : []
// Forget transformed:
if ($[2] !== compilerOutput.kind || $[3] !== compilerOutput.error?.details) {
t4 = compilerOutput.kind === "err" ? compilerOutput.error.details : [];
$[2] = compilerOutput.kind;
// this is good!
$[3] = compilerOutput.error?.details;
$[4] = t4;
} else {
t4 = $[4];
}
// After next/babel
if ($[2] !== compilerOutput.kind || $[3] !== ((_compilerOutput$error =
compilerOutput.error) === null || _compilerOutput$error === void 0 ? void 0 :
_compilerOutput$error.details)) {
t4 = compilerOutput.kind === "err" ? compilerOutput.error.details : [];
$[2] = compilerOutput.kind;
// Oh no!!
$[3] = _compilerOutput$error.details;
$[4] = t4;
} else {
t4 = $[4];
}
```
<img width="553" alt="Screenshot 2024-03-19 at 4 41 15 PM"
src="https://github.com/facebook/react-forget/assets/6425824/e87ee704-6c67-4e10-824b-71e97e7e19f5">
Slightly improves source locations for JSX elements so that the opening and
closing tag have distinct locations that match up with source. The identifier
itself within the closing tag still has the wrong location, but at least this is
an improvement.
Doesn't fix the fbt thing but it was worth a try.
---
Previously, we always emitted `Memoize dep` instructions after the function
expression literal and depslist instructions
```js
// source
useManualMemo(() => {...}, [arg])
// lowered
$0 = FunctionExpression(...)
$1 = LoadLocal (arg)
$2 = ArrayExpression [$1]
$3 = Memoize (arg)
$4 = Call / LoadLocal
$5 = Memoize $4
```
Now, we insert `Memoize dep` before the corresponding function expression
literal:
```js
// lowered
$0 = StartMemoize (arg) <---- this moved up!
$1 = FunctionExpression(...)
$2 = LoadLocal (arg)
$3 = ArrayExpression [$2]
$4 = Call / LoadLocal
$5 = FinishMemoize $4
```
Design considerations:
- #2663 needs to understand which lowered instructions belong to a manual
memoization block, so we need to emit `StartMemoize` instructions before the
`useMemo/useCallback` function argument, which contains relevant memoized
instructions
- we choose to insert StartMemoize instructions to (1) avoid unsafe instruction
reordering of source and (2) to ensure that Forget output does not change when
enabling validation
This PR only renames `Memoize` -> `Start/FinishMemoize` and hoists
`StartMemoize` as described. The latter may help with stricter validation for
`useCallback`s, although testing is left to the next PR.
#2663 contains all validation changes
Fixes T180504437. We expected `<fbt:param>` to always have no surrounding
whitespace or have both leading and trailing whitespace, it can have one but not
the other, though such cases are rare in practice.
Repro from T180504728 which reproduced internally and on playground, neither of
which have #2687 yet. That PR (earlier in this stack) already fixes the issue,
so i'm just adding the repro to help prevent regressions.
I addressed some of the cases that lead to this invariant but there were still
more. In this case, we have scopes like this:
```
scope @1 declarations=[t$0] {
let t$0 = ArrayExpression []
if (...) {
return null;
}
}
scope @2 deps=[t$0] declarations=[t$1] {
let t$1 = Jsx children=[t$0] ...
}
```
Because scope 1 has an early return, PropagateEarlyReturns wraps its contents in
a label and converts the returns to breaks:
```
scope @1 declarations=[t$0] earlyReturn={t$2} {
let t$2
bb0: {
let t$0 = ArrayExpression []
if (...) {
t$2 = null;
break bb0;
}
}
}
scope @2 deps=[t$0] declarations=[t$1] {
let t$1 = Jsx children=[t$0] ...
}
```
But then MergeReactiveScopesThatInvalidateTogether smushes them together:
```
scope @1 declarations=[t$1] earlyReturn={t$2} {
let t$2
bb0: {
let t$0 = ArrayExpression [] // <--- Oops! We're inside a block now
if (...) {
t$2 = null;
break bb0;
}
}
let t$1 = Jsx children=[t$0] ...
}
```
Note that the `t$0` binding is now created inside the labeled block, so it's no
longer accessible to the Jsx instruction which follows the labeled block. This
isn't an issue with promoting temporaries or propagating outputs, but a simple
issue of the labeled block (used for early return) introducing a new block
scope. The solution here is to simply reorder the passes so that we transform
for early returns after other optimizations. This means the jsx element will
basically move inside the labeled block, solving the scoping issue:
```
scope @1 declarations=[t$1] earlyReturn={t$2} {
let t$2
bb0: {
let t$0 = ArrayExpression [] // ok, same block scope as its use
if (...) {
t$2 = null;
break bb0;
}
let t$1 = Jsx children=[t$0] // note this moved inside the labeled block
}
}
```
This was an oversight in codegen. The entire pipeline supports multiple values
in a for initializer, but codegen was dropping all but the first initializer.
For T181507827 — adds an invariant in codegen when emitting identifiers to
ensure that we only create babel Identifier nodes for nodes that the compiler
has explicitly promoted to valid, named identifiers. This means that we'll fail
for unnamed temporaries (previously caught), as well as promoted temporaries
that somehow didn't get renamed by RenameVariables (newly caught).
Adds a visitor to collect all the globals that are referenced within the
function, and then uses this list to avoid synthesizing variables with
conflicting names. This is used in both RenameVariables (for promoted
temporaries) and Codegen (for `$` and change variables only, so far, but this
can be extended in follow-ups).
This is a key part of avoiding generating conflicting names in our output. To
start, RenameVariables now returns a Set of the unique identifier names that
exist in the function. Codegen uses this to avoid generating duplicate names for
change variables and for the `$` useMemoCache variable. Rather than always emit
`$` or `c_N`, codegen checks that this name would not conflict and appends an
incrementing suffix until it finds a unique name.
Note that it's still possible for us to generate conflicts with global
variables, both during RenameVariable and Codegen. The next step will be to
avoid conflicts with globals.
Another title for this PR could be "Yet another reason for HIR-everywhere"
ReactiveFunctionVisitor doesn't traverse into HIRFunctions from
FunctionExpression and ObjectMethod values. This means that
PromoteUsedTemporaries and RenameVariables also weren't traversing into such
functions, and those values weren't getting promoted and renamed correctly.
This PR updates ReactiveFunctionVisitor with a method that can optionally be
invoked to traverse an HIRFunction and call the appropriate visitor methods.
PromoteUsedTemporaries and RenameVariables invoke this to ensure they visit all
places, even in nested HIRFunctions.
I realized that codegen still had a fallback for generating identifier nodes for
unnamed temporaries. This PR updates codegen to throw if it needs to generate an
identifier for a temporary, and updates earlier passes to promote temporaries to
named values in all the cases that were missed:
* BuildHIR needs to promote temporaries for temporaries in destructuring
bindings and catch clause bindings
* PromoteUsedTemporaries has to promote temporaries for destructured function
parameters or function params that are context variables.
Uses an enum for Identifier.name to distinguish originally named identifiers vs
promoted temporaries. An opaque type for the named identifier variant makes it
hard to accidentally create that type.
filepath
Internal rollout currently has a good number of test failures.
`enableEmitInstrumentForget` can help developers understand which functions /
files they should look at:
```
// input
function Foo() {
userCode();
// ...
}
// output
function Foo() {
if (__DEV__ && inE2eTestMode) {
logRender("Foo", "/path/to/filename.js");
}
const $ = useMemoCache(...);
userCode();
}
```
This invariant interpolated values into the `reason` which prevent our internal
tooling from grouping related errors. This PR updates to make the reason static
and interpolate the description.
Per the previous PR, we don't have a way to rewrite an arbitrary subset of a
ReactiveFunction into a function expression, since FunctionExpression's contents
is still in HIR.
While long-term our plan is to move to HIR everywhere, this PR adds a stopgap of
adding a ReactiveFunctionValue variant of ReactiveValue. As a reminder,
ReactiveValue is a union of (HIR) InstructionValue | SequenceExpression |
LogicalExpression | ConditionalExpression.
For now i did a first stab at the visitors and transforms with the idea that:
* By default, visitors/transforms _don't_ look into these function expressions,
since we didn't previously traverse into (HIR-based) FunctionExpression either
* But there is a visitor/transform method that you can override if you need to.
The hook guards are incompatible with using a forget-runtime. Specifically,
forget-runtime needs to make a call to `useState()` or some other hook to attach
data to the fiber, but all the builtin hooks are overridden to disallow calling
them outside of explicit boundaries. We'd either have to wrap the useMemoCache
call in a push/pop to allow it to call other hooks, or as in this PR, just move
it outside the enforcement.
It's starting to get complex just with a couple of extra
passes — we either need to substantially extend the HIR or (as i've done so far)
pass information from early passes to later ones. This PR changes things so that
very early in the babel plugin we fork into a separate mode. Forest has
its own `compileProgram()` equivalent, its own pipeline, its own codegen, etc.
Fixes the case from the previous PR by using a different sentinel for
uninitialized cache values and early returns. I confirmed with console.log that
the reactive scope for `x` only evaluates on the first execution, after which we
figure out that we don't need to execute it again.
Adds support for early returns within reactive scopes, behind a new feature
flag. The flag is off by default, where this case continues to throw a Todo
bailout.
Since implementing a sketch of the codegen in the previous PR I realized that
it's easy enough to implement the more optimal output, so i've updated that
here. Rather than both the if and else branch of the reactive scope having an
"if the return value was not a sentinel return it" check, we instead make the
return temporary a proper declaration of the reactive scope. Then, since it's
actually an output it's available in the outer block scope, and we could do a
single if-return after the reactive scope.
Edit: see comment below for thoughts on test cases.
Implements codegen for reactive scopes with early returns, though we don't ever
construct such a case yet. See comments in the code. There is a slightly more
optimal output that would require a larger refactor (also described in code
comments), for now i'm starting with the simpler approach since this is
relatively rare so we don't need to optimize code size / runtime as much.
Adds a new `earlyReturnValue` property on ReactiveScope which will be set if the
scope had one or more early returns, with information about the temporary
identifier that the early return value will be assigned to, as well as the label
to be used for breaking (to simulate the early-return). The next PR shows the
intended codegen.
The previous PR introduced `memoize` instructions whose lvalues aren't used, but
which can't be pruned by DCE due to pipeline ordering. Here we change to make
memoize an instruction intended for its side effects only, and prune during
codegen.
Adds an option to preserve existing memoization guarantees for values produced
with useMemo and useCallback. We still discard the calls to these hooks, but we
preserve the information that the value is frozen at that point in the program.
Because these values are produced solely within the useMemo/useCallback
callback, their mutation cannot have any interspersed hook calls. This means
that the values mutable range will never span a hook and end at the point of the
useMemo, ensuring that they are memoized at the same point.
The main things that can change (relative to the orignal code) are:
* Forget will infer a precise set of dependencies, ignoring the user-provided
values. In practice this should only occur if the original code had a lint
violation, which Forget would bail out on. So in practice this shouldn't happen
unless the code doesn't use the React linter.
* Forget may start the memoization block earlier than the developer did if other
values are mutated along with the value being produced. This can cause
memoization to fail, but only in situations where it would have failed
previously:
```javascript
const a = [];
useFoo();
const b = useMemo(() => {
const c = a;
c.push(1);
return c;
}, [a]);
```
In this example (sans Forget) the useMemo will invalidate on every render
because `a` will always be a new array and its listed as a dependency of the
useMemo. Forget would correctly determine that the memoization would have to
work as follows:
```javascript
let c;
if (...) {
const a = []
useFoo(); // OOPS we made a hook call conditional
const t0 = a;
t0.push(1);
c = t0;
...
} else {
c = $[...]
}
```
Because this is invalid, Forget would (later in the pipeline) strip out this
memoization block and (as with the original) leave `c` un-memoized.
In this same example, removing the hook would cause Forget to be able to memoize
a value that wasn't memoized before:
```javascript
const a = [];
const b = useMemo(() => {
const c = a;
c.push(1);
return c;
}, [a]);
```
This invalidates every render without Forget, but would memoize correctly with
Forget (it would expand the memoization block to include the declaration of
`a`).
The previous PR converts reactive scopes to normal instructions, so that Forest
mode won't have any scopes left by the time we reach codegen. This PR removes
the now-unused codegen logic for forest.
For Forest, we previously converted reactive scopes into derived signals during
Codegen. I'm moving this to a separate pass primarily to keep codegen simple
since there's enough complexity just dealing with core JS semantics. Ideally
we'd do a similar setup even for regular Forget, ie lower reactive scopes just
prior to codegen.
At the same time i also reordered the forget passes to be just before codegen,
and cleaned things up a bit. For state lowering, we now just rewrite `useState`
-> `createState`, because we actually need to keep around the setter function to
trigger scheduling updates in addition to writing the signal value.
---
I modeled guards as try-finally blocks to be extremely explicit. An alternative
implementation could flatten all nested hooks and only set / restore hook guards
when entering / exiting a React function (i.e. hook or component) -- this
alternative approach would be the easiest to represent as a separate pass
```js
// source
function Foo() {
const result = useHook(useContext(Context));
...
}
// current output
function Foo() {
try {
pushHookGuard();
const result = (() => {
try {
pushEnableHook();
return useHook((() => {
try {
pushEnableHook();
return useContext(Context);
} finally {
popEnableHook();
}
})());
} finally {
popEnableHook();
};
})();
// ...
} finally {
popHookGuard();
}
}
// alternative output
function Foo() {
try {
// check current is not lazyDispatcher;
// save originalDispatcher, set lazyDispatcher
pushHookGuard();
allowHook(); // always set originalDispatcher
const t0 = useContext(Context);
disallowHook(); // always set LazyDispatcher
allowHook(); // always set originalDispatcher
const result = useHook(t0);
disallowHook(); // always set LazyDispatcher
// ...
} finally {
popHookGuard(); // restore originalDispatcher
}
}
```
Checked that IG Web works as expected
Unless I add a sneaky useState:
<img width="705" alt="Screenshot 2023-12-05 at 6 44 59 PM"
src="https://github.com/facebook/react-forget/assets/34200447/3790bd76-7d71-44b5-a62e-f53256fb5736">
---
Prior to this PR, we were mutating functions after CodegenReactiveFunction
completes (in `Entrypoint/Program.ts`).
The reasoning for this separation was that we wanted to keep non-compiler logic
out of the core Pipeline. However, it made our code difficult to read and reason
about.
Open to other alternatives, like adding a pass after Codegen.
I did a double take when I thought we didn't handle returning the
error when reading the code and when I edited the code, typescript told
me that there's no need to return as creating the error will throw.
This PR makes it clear from the name of the function that we will throw.
Reactive scopes are currently preceded by individual variable declarations, one
for each of the scope's dependencies. Only after checking independently if each
of these dependencies has changed do we then do an "or" to check if we should
actually recompute the body of the scope.
But that's wasteful: it's more efficient to rely on `||` short-circuiting, and
recompute as soon as any input changes.
The main reason I can see not to do this is debugging. Having the change
variables makes it easy to see what changed. But at this point I think it makes
sense to optimize for code size and performance; we can always add back a
dev-only mode that uses change variables if that turns out to help debugging.
This morning @mofeiZ reminded me that our codegen doesn't really have any guards
against reordering, since temporaries are lazily emitted. We're relying on the
fact that our lowering and memoization carefully preserves order of evaluation,
such that delaying the instructions in codegen doesn't change semantics.
To help catch any mistakes with this, I had previously added code that reset the
codegen context's temporaries before/after exiting a reactive scope. That
ensured that temporaries from within the scope weren't accessible outside it.
This PR extends that approach to _all_ blocks, so that temporaries created
within a block aren't accessible outside it.
I'm also going to explore more actively resetting temporaries after they
"should" be used. There are a couple cases where temporaries are reused, though,
which we have to change first.