This is two things:
* A toy semantic analysis that handles a tiny subset of JS, including labeled
statements, labeled break/continue, and variable
declaration/reference/reassignment. This only exists as a way to prove out the
API for the more important bit:
* More importantly, this defines a data model for the semantic analysis results
and an API for building up the semantic analysis.
Subsequent diffs will replace the first bit (toy analysis impl), while keeping
the second part.
---
Changes in `@enableOptimizeFunctionExpressions` caused a bug in the last Forget
sync to VR Store. The repro can be summarized to something like this:
```js
function foo() {
const x = true; // some constant or global
// Add some branching for type inference
// This can be a Logical expression as well (e.g. `4 || 5`)
if (...) { }
// In this HIR block, SSA inserts a `x$2 = phi(x$1, x$1)`.
// EliminateRedundantPhiNodes needs to rewrite all references of `x$2` to `x$1`
const accessXInLambda = () => x;
return accessXInLambda;
}
---
Revives e2e test infra from #587.
- All React component-like functions are compiled.
- `yarn jest` runs each e2e test twice (forget and no forget)
Github Actions is already running `yarn test`, which includes all jest tests
```
Run yarn test
yarn run v1.22.19
$ yarn workspaces run test
> babel-plugin-react-forget
$ yarn jest && yarn snap:build && yarn snap
$ tsc && jest
PASS main src/__tests__/Result-test.ts
PASS main src/__tests__/DisjointSet-test.ts
PASS e2e with forget src/__tests__/e2e/hello.e2e.js
PASS e2e no forget src/__tests__/e2e/hello.e2e.js
Test Suites:
[4](https://github.com/facebook/react-forget/actions/runs/5732016200/job/15534129231?pr=1881#step:8:5)
passed, 4 total
Tests: 23 passed, 23 total
Snapshots: 11 passed, 11 total
Time:
6.1[5](https://github.com/facebook/react-forget/actions/runs/5732016200/job/15534129231?pr=1881#step:8:6)3
s
```
Using an arena allocator can be faster, but it comes with several challenges:
* It requires tediously tagging nearly every value with a lifetime. Any function
that has to deal with arena-allocated data (which is every meaningful function)
ends up with a lifetime parameter. Bleh. We also have to thread the allocator
itself wherever we need to allocate, though this is less of a problem since
_most_ functions already need the Environment and we can store the allocator
there.
* There is not yet widespread support in the Rust ecosystem for using custom
allocators with custom data types. This means that things like HashMap/Set and
IndexMap/Set can't be arena allocated. This means that either we have to add
support to these data types (by upstreaming or forking) or just accept that
we're only partially using the arena allocator.
* Finally, `bumpalo`'s `Box` cannot be moved out of, which turns out to be an
annoying limitation that i've already had to work around several times.
In the end i'm not sure arena allocators are worth it at this stage of the
project. Relay Compiler has been successful without one, and other Rust-based
internal compilers get by without them too.
The current heuristic to check if setState is called in render is based on
whether the lambda containing the call to setState has a mutable range that got
extended. This doesn't seem to work in all cases so this validation needs a bit
more work before we can turn it on by default
This includes a fix where the HermesToBabelAdapter was incorrectly outputting a
`ClassMethod` instead of `ClassPrivateMethod` in certain scenarios. This was
causing issues with our eslint plugin as it would crash on any file containing
private methods because a malformed AST was formed.
There was a bug in our internal Hermes to Babel adapter which caused some
malformed AST to be constructed, which babel would then validate as being
incorrect. That bug was fixed, but we still have to add this plugin so that
private class methods can be parsed by babel.
Tested internally
Adds GitHub actions to build and test the Rust version of Forget on every commit
to main. I didn't enable this on PRs for now just to avoid slowing down other
folks, this seemed like a reasonable compromise while we're still in the
experimentation phase. I test locally, and CI will catch if I or anyone else
slips up.
Rejects unknown fields during deserialization of helper structs (which don't
have a `type` field). This would be super useful for AST `Node` types, too, but
that requires implementing a custom deserializer so i'm punting on that for now.
Adds more AST types to handle the full ES2021 spec. At least, in theory I
defined the schema correctly, and we'll have to just fix any bugs as we
encounter them.
Updates to use our own codegen'd `Serialize` implementation for AST node types.
Notably, this allows us to ensure that we always emit a `type` field, even when
the type is obvious from context (serde doesn't do this) and avoid
double-emitting `type` fields (which serde can do if you try to force a tag to
be emitted). We still use the Deserialize impl because it works fine for our
purposes.
Adds almost all the remaining AST types from ES2015 to `estree`, and updates the
swc->estree->hir conversions accordingly. In a few places i punted with a
`Diagnostic::todo()`.
Ports InlineUseMemo to Rust, this is the only missing transform pass on the
pipeline up through constant propagation. I wanted to finish this so that we
could do benchmarking of the early phase of compilation. UseMemo does a bunch of
rewriting so it seemed worth comparing performance.
Included:
* Add swc -> estree and estree -> HIR conversions for arrow functions and call
expressions
* Add HIR definition for labeled terminals and call expressions
* Utility for inlining one function into another — note that this requires
remapping InstrIx operands since instruction indices will change. An alternative
would be to store all the instructions for both outer/inner functions in the
same array, in which case we wouldn't need to remap.
* Helpers for mutably iterating the operands of an instruction or terminal.
* The actual inline_use_memo() function. The overall logic is similar, i just
had to slightly shuffle the order of operations to satisfy the borrow checker.
Console warnings when running tests: ``` ts-jest[versions] (WARN) Version 5.1.3
of typescript installed has not been tested with ts-jest. ```
Our typescript version is ahead of what's supported in ts-jest. This PR updates
ts-jest to work with the latest typescript compiler.
This PR adds a Hole kind that can be present in both ArrayExpression and
ArrayPatterns.
This Hole type is not interesting for our inference passes and is skipped over
for all of the pipeline.
Forget doesn't understand the React namespace object and generates incorrect
code when compiling code that loads props from this namespace object.
This PR makes Forget bailout when we see a property load from React namespace
object.
Upgrades every dependency on `hermes-parser` and `prettier` to the versions that
we're using in the rest of our other codebases. Notably, these package versions
are necessary for our other codebases to make use of mapped types in Flow.
8d021e5d797c5ef3b4e18a0be735c04005f2ae7a configured `./forget` as a workspace
root, so these `yarn.lock` files in `app/*` and `packages/*` are no longer
relevant. This PR deletes them to reduce noise and confusion.
Ports `MergeConsecutiveBlocks` to Rust. This was a tricky one: as we iterate
through the blocks _if_ the block ends up being merged with its predecessor we
need to consume it and modify its predecessor block (ie, mutating two things
from the same data structure - shared mutation!). But if we _don't_ need to
merge, then we need to not drop the current block. Ie, we sort of need to
conditionally take ownership of the current block during iteration and put it
back.
I added a `BlockRewriter` helper type for this which has a helper to iterate
safely. It calls the iterator lambda, moving blocks one at a time into the
lambda. The lambda returns either `Keep(block)` to give the block back and keep
it or `Remove` to tell the rewriter to drop the block. Thanks to making the
Blocks data structure hold `Option<Box<BasicBlock>>` items, "moving" the block
actually just means nulling out the option and conceptually giving ownership of
the pointer to the callback - the data itself never moves.
This involved creating a custom `Blocks` wrapper type, which i had been putting
off doing. This cleans up a bunch of other logic around traversing blocks.
This PR adapts the `Diagnostic` type and helpers from Relay Compiler to Forget.
The main changes are:
* Removing some fields it doesn't seem we'll use for a while, if ever (like
machine-readable arbitrary key/value data)
* Switching from Relay Compiler's `Location` type to our SourceRange type
* Using the severity enum previously established in the `forget_build_hir`
crate, with Todo/Unsupported/InvalidSyntax/InvalidReact/Invariant variants
* Adding support for translating our `Diagnostic` into a `miette::Diagnostic` so
we can use miette's pretty printing
With the new Diagnostic type in place i updated the existing build_hir code to
use it and confirmed that the errors are now even nicer (when we attach extra
data to annotate labels):
<img width="860" alt="Screenshot 2023-07-14 at 3 10 00 PM"
src="https://github.com/facebook/react-forget/assets/6425824/9d29425a-938b-4872-b999-aa174a3c329a">
This addresses (or brings us closer to addressing) many of your comments on the
last diagnostics PR, @poteto!
Per the title, this updates the main readme file with a guide to contributing to
the Rust compiler, and ensures that we have a brief description of every crate
in local readme.md files. The `forget_hir` one is the most extensive and
describes the high-level design of the HIR.
I've primarily used what Cargo calls virtual workspaces, where the top-level
Cargo.toml just lists a bunch of packages and they each have their own
dependencies. This is fine, but i've noticed that more repos are using real
workspaces and they offer a bunch of benefits. You define the dependencies in
the top-level Cargo.toml and then can easily refer to them from multiple crates,
ensuring all the versions match up. It makes it easier to refer to other crates
in the workspace, too, because you define the path once at the root, then every
other crate can just say `forget_foo = { workspace = true }`.
I also renamed all the crates to be prefixed with `forget_`, in some cases
removing the redundant `hir` name, So `hir-optimization` became
`forget_optimization`, `hir-ssa` became `forget_ssa`. Also note the switch from
hyphenated names to underscores everywhere, since at the end of the day you have
to write the name with underscores in source code.
I also deleted the demo crate that i started with since we don't need it
anymore.
And finally, i added an explicit publish = false to all the crates just to
prevent mistakes.
This is a quick "good enough" first pass at computing function expression
context variables. It definitely needs to be overhauled, but it's enough to make
a lot of common cases work correctly.
First, this PR adds a hand-rolled Visitor trait for `estree`. Long-term that
should probably be code-generated, but there are some subtleties to it such as
the `visit_lvalue(callback)` helper which has to be wrapped around various calls
(or we need some other way to distinguish identifiers within lvalues from
identifiers within rvaluess). So for now it makes sense to hand-roll it until we
are more confident in exactly how it should work.
Given that visitor, i was able to port part of the existing
`gatherCapturedDeps()` to Rust with some modifications. Note that we assume
_something_ has run name resolution on the estree to match up identifiers to the
declaration they refer to. But we don't store information about parent scopes so
we can't walk up to check where things were defined. Instead we do the
following:
* Build up a list of all referenced identifiers
* Also build up a set of bindings defined in the function itself
* After visiting, filter our the first list to only include identifiers not
defined by the function itself, and to eliminate duplicates.
This covers the majority of cases: the most obvious gap is nested function
expressions though actually that should just work.
Updates eliminate_redundant_phis and constant_propagation to recurse into
function expressions. I also realized there was a bug in EliminateRedundantPhis
in which we wouldn't traverse into function expressions encountered after
finding a back edge, so i fixed that logic in both versions.
Updates `enter_ssa()` to recurse into function expressions. In the TS compiler
we use a single Builder instance and copy (references to) the function
expression's blocks into the builder. That type of sharing just does not play
nicely with Rust. But... we don't need to do that! We already know the context
variables of the function expression, so we can lookup each of them to find
their re-mapped identifier, and set that as the starting state for the entry
block of the function expression. That lets us use normal recursion and
otherwise not share any information between the outer and inner builders.
Of course to make this work we actually have to populate Function.context, but
the algorithm _should_ work.