David Alecrim 90fcf8539e fuzzy: Fix crash with Unicode chars whose lowercase expands to multiple codepoints (#52989)
Self-Review Checklist:

- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable

Closes #52973

## Problem

The file picker crashes with `highlight index N is not a valid UTF-8
boundary` when file paths contain Unicode characters whose lowercase
form expands to multiple codepoints. Turkish `İ` (U+0130) is the trigger
here: Rust's `char::to_lowercase()` turns it into `i` + combining dot
above (two codepoints). That expansion breaks the fuzzy matcher in two
ways:

1. The `j_regular` index mapping mixes the expanded lowercase index
space with the original character index space, so highlight positions
land on invalid byte boundaries.
2. The scoring matrices are allocated with the expanded length but
indexed with the original length as stride, so rows alias each other and
corrupt stored values.

Users with Turkish locale filenames were hitting this on v0.229.0 and
v0.230.0 stable.

## Fix

I went with simple 1:1 case mapping: a `simple_lowercase` helper in
`char_bag.rs` that takes only the first codepoint from `to_lowercase()`
and drops any trailing combining characters. For `İ` this gives `i`,
which is what anyone would actually type in a search query. The same
function is used in the matcher, the char bag pre-filter, and both
query-lowercasing call sites (`paths.rs` and `strings.rs`).

This gets rid of the `extra_lowercase_chars` BTreeMap, the `j_regular`
adjustment, and the matrix sizing discrepancy. The matcher now works
with a flat character array where `lowercase_candidate_chars.len() ==
candidate_chars.len()`, so there's no expanded-vs-original index space
to get wrong.

I also fixed `CharBag::insert`, which used `to_ascii_lowercase()` and
silently ignored non-ASCII characters. A file like `aİbİcdef.txt`
wouldn't show up when searching `ai` because `İ` was never registered as
`i` in the bag. It now goes through `simple_lowercase` too.

The alternative was keeping full case folding and fixing the index
tracking with a `Vec<usize>` mapping expanded positions back to
originals. That would work but keeps the dual-index-space complexity
that caused these bugs, plus adds a per-candidate allocation for the
mapping vector.

## Prior art

fzf uses Go's `unicode.To(unicode.LowerCase, r)`, which is simple case
mapping -- always one rune in, one rune out. `İ` maps to `i`, no
expansion.

VS Code's `String.toLowerCase()` does produce the expanded form, but the
scorer compares UTF-16 code units independently and sidesteps the
problem in practice.

Neither tool maintains a mapping between expanded and original index
spaces.

## Trade-off

Searching for the combining dot above (U+0307) won't match `İ` in a path
anymore. Nobody types combining characters in a file picker, and fzf
doesn't support it either.

## Screenshot
<img width="1282" height="458" alt="Screenshot 2026-04-02 at 09 56 34"
src="https://github.com/user-attachments/assets/720d327a-4855-4d4d-989e-cbd1c0657f97"
/>


Release Notes:
- Fixed a crash and improved matching and highlighting in the file
picker for paths with non-ASCII
  characters (e.g., Turkish İ, ß, fi).

---------

Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2026-04-03 11:00:13 +00:00
2026-04-02 19:17:29 -03:00
2026-01-22 23:57:26 +05:30
WIP
2023-12-14 09:25:14 -07:00
2024-09-30 17:46:21 -04:00
2026-02-03 20:31:02 -07:00
2025-02-04 09:02:59 -05:00
2025-02-04 09:02:59 -05:00
2026-01-06 12:49:51 -08:00
2025-10-27 13:27:59 -04:00
2025-10-17 18:58:14 +00:00
2026-03-17 16:34:35 +00:00

Zed

Zed CI

Welcome to Zed, a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.


Installation

On macOS, Linux, and Windows you can download Zed directly or install Zed via your local package manager (macOS/Linux/Windows).

Other platforms are not yet available:

Developing Zed

Contributing

See CONTRIBUTING.md for ways you can contribute to Zed.

Also... we're hiring! Check out our jobs page for open roles.

Licensing

License information for third party dependencies must be correctly provided for CI to pass.

We use cargo-about to automatically comply with open source licenses. If CI is failing, check the following:

  • Is it showing a no license specified error for a crate you've created? If so, add publish = false under [package] in your crate's Cargo.toml.
  • Is the error failed to satisfy license requirements for a dependency? If so, first determine what license the project has and whether this system is sufficient to comply with this license's requirements. If you're unsure, ask a lawyer. Once you've verified that this system is acceptable add the license's SPDX identifier to the accepted array in script/licenses/zed-licenses.toml.
  • Is cargo-about unable to find the license for a dependency? If so, add a clarification field at the end of script/licenses/zed-licenses.toml, as specified in the cargo-about book.

Sponsorship

Zed is developed by Zed Industries, Inc., a for-profit company.

If youd like to financially support the project, you can do so via GitHub Sponsors. Sponsorships go directly to Zed Industries and are used as general company revenue. There are no perks or entitlements associated with sponsorship.

S
Description
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
Readme 866 MiB
Languages
Rust 95.1%
JSON-with-Comments 2.7%
Inno Setup 0.5%
Scheme 0.4%
Shell 0.3%
Other 0.8%