Both were defensive against rare edge cases the reviewer flagged but
which don't justify the complexity:
- $log clone: protects against tag pollution on the per-request Log
if reportError fires AND the request later errors. http.php's
request-end handler overwrites the core fields (namespace, message,
action, etc.) anyway; only addTag/addExtra accumulate. Aligns with
Embeddings/Text/Create.php precedent which mutates $log directly.
- sdks in-lock re-read: closes a sequential-acquire stale-read race on
the sdks append-list. The race exists but the impact is bounded —
one SDK registration delayed until the next request from that SDK
fires. Self-healing on retry. The codebase already accepts this
exact race for auths, oAuthProviders, services, identities, sessions,
factors, etc. Special-casing this one site is precision the analytics
use case doesn't need.
- TTL defaults (5s skip / 10s fail), 3s wait timeout, and 60s sentry
rate-limit window are now class constants.
- Telemetry outcome labels (acquired/skipped/contended/backend_error/
release_error) are class constants — typo-safe across the 9 use sites.
- Fix: inferTargetFromKey was indexing into segment 2 with limit 4,
which returned the project sequence instead of the collection name
ever since project-id was added to the key in c2a249c48b. Telemetry
was tagging 'target' with project sequences instead of 'projects' /
'users' / etc. Now indexes segment 3 with limit 5 to correctly pick
the collection segment from lock:platform:{project}:{target}:...
Reverts the catch widening from earlier — the underlying
Utopia\Lock\Distributed::tryAcquire only calls ext-redis methods,
which throw RedisException exclusively. Catching Throwable was
over-broad and would silently swallow real bugs (TypeError, Error)
into a fail-open path.
The sdks attribute is an append-only list, not idempotent. With the
read happening outside the lock, two sequential acquirers could each
read the same stale list and overwrite each other's appends.
Now the lock body re-reads the keys document and re-derives the sdks
array from the fresh state. Skip-on-contention still drops the update
when the lock is held, but a same-SDK retry on the next request picks
the registration up.
Bounded loss only affects the rare 'first-seen' SDK request that
happens to land while the lock is held; sequential traffic from the
same SDK (or any later request from any SDK) re-attempts and writes.
Co-authored-by: greptile-apps[bot]
Two P1s from review:
- Acquire path caught only \RedisException; any other Throwable from the
lock library would escape and skip the fail-open callback. Widened to
\Throwable so any backend failure falls back to running unlocked.
- reportError mutated the per-request $log directly, leaving it in a
'lock.{action}' state that the request-end handler would then submit
as the request log. Clones $log into a dedicated $lockLog so the
shared instance is untouched.
Covers the four public methods (set, run, runOrFail, withKey) plus the
kill switch and the project-fallback behavior.
Tests against a real Redis (the appwrite container has it always-on);
no markTestSkipped fallback — the suite fails loudly if Redis is
unreachable rather than silently passing.
Cloud production runs four separate single-master+replica Dragonfly
deployments (cache, queue-dragonfly, queue-usage, pubsub-dragonfly),
not sharded Redis Cluster topology — confirmed by deploy/cloud/values
+ environments/production/*.values.yaml (Dragonfly Operator with
replicas=2 = 1 primary + 1 read replica), and by the dev DSN scheme
'redis://' (not 'redis-cluster://').
So a standard \Redis client suffices for the direct redis resource
(timelimit, Lock). Cloud just needs to pass _APP_REDIS_HOST/PORT/USER/
PASS through to the appwrite container — handled in the cloud PR's
docker-compose.yml change.
This reverts the resource to its original pre-PR shape. The
utopia-php/lock cluster-support PR (utopia-php/lock#1) stays open at
upstream as a future-ready option if cloud ever moves to actual
Redis Cluster mode.
Per-manager request, lock keys are now prefixed with the project's
internal id (sequence) so that:
- Locks are partitioned by project — Redis cluster slot affinity
if/when sharded.
- Cross-project requests can't compete on the same key for
collection-scoped resources.
- Telemetry (counter + Sentry tags) carries 'project' alongside
'target', so dashboards can filter contention by project.
Key shapes:
set: lock:platform:{project}:{collection}:{id}:{attribute}
run/orFail: lock:platform:{project}:{collection}:{id}
withKey: raw (caller-provided)
Lock now requires a project document at construction. All existing
call sites (4 in CE + 2 in cloud) run inside Http::init()-resolved
request scope where the project document is set, so no migration
needed. Workers/CLI without project context can use withKey directly.
Every CI run pushes ghcr.io/<repo>/appwrite-dev:<sha> and nothing
removes it. On an active repo with many PRs the GHCR storage grows
without bound. Add a cleanup job that runs after all consumer jobs
complete (always, even if some fail) and deletes the SHA-tagged
package version via the Packages API.
Addresses Greptile feedback on appwrite/appwrite#12176.
The build job uploads the appwrite-dev image as an actions artifact
(~hundreds of MB), and 30+ E2E test jobs all pull it concurrently with
actions/download-artifact. GitHub Actions' artifact storage struggles
with that many parallel downloads and intermittently fails with
BlobNotFound or 'Artifact download failed after 5 retries'.
Push the built image to ghcr.io/<repo>/appwrite-dev:<sha> in the build
job and pull from GHCR in each test job. GHCR handles parallel image
fetches without throttling.
Mirrors appwrite-labs/cloud#3906.
The dedicated \Redis DI resource (used by timelimit and the new Lock
class) was reading _APP_REDIS_HOST/PORT/PASS exclusively. Cloud
deployments configure cache via _APP_CONNECTIONS_CACHE URI form
(e.g. cache=redis://dragonfly:6379) and don't pass the legacy
_APP_REDIS_* vars to the appwrite container locally, so timelimit and
Lock both fail to connect outside production where Helm separately
injects the legacy vars.
Now prefers _APP_CONNECTIONS_CACHE when set (matching the cache pool
backend), falls back to _APP_REDIS_* for CE-style configs. No new env
vars introduced; both timelimit and Lock work in CE, cloud-local, and
cloud-production without compose changes.
Lock now uses Utopia\Lock\Distributed directly and owns the full
acquire/release/telemetry/error-reporting/fail-open/kill-switch logic
that previously lived in two inline DI factory closures.
Adds withKey($key, $fn, $ttl, $orFail, $waitTimeout) as a generic
escape hatch for non-platform key shapes (cache, queue, edge) and
unusual TTL/timeout requirements.
Per-attribute lock keys for set() so that an accessedAt bump and a
mcpAccessedAt bump on the same projects:{id} document don't compete.
Whole-document operations (run, runOrFail) keep document-level keys.
Removes the standalone distributedLock and distributedLockOrFail DI
factories — Lock is the single API.
request.php shrinks ~150 LOC; Lock.php grows to ~190 LOC.
Extracts the lock-key format and the lock+auth-skip+sparse-update pattern
into Appwrite\Locking\Lock with three methods:
- set(collection, id, attribute=accessedAt, value=null) — throttled
single-attribute write
- run(collection, id, fn) — generic skip-on-contention
- runOrFail(collection, id, fn) — block-then-409 for the deferred
lost-update follow-up
Migrates the 4 call sites (router projects accessedAt + 3 in shared/api)
off the raw $distributedLock callable. Raw factories stay as escape
hatches for non-platform key shapes.