I believe these tests may have been flaky as a result of the bug fixed
in the previous commit. A big hint is that they were all crashing with
SIGSEGV with no stack trace. I suspect that some lingering SIGIOs from
cancelations were being delivered to a thread after its `munmap` call,
which was happening because the test runner called `Io.Threaded.deinit`
to cause all of the (detached) worker threads to exit.
If this passes, I'll re-run the x86_64-linux CI jobs on this commit a
few times before merge to try and be sure there are no lingering
failures.
Resolves: https://codeberg.org/ziglang/zig/issues/30096
Resolves: https://codeberg.org/ziglang/zig/issues/30592
Resolves: https://codeberg.org/ziglang/zig/issues/30682
* cache nul handle for child process execution
* make opening the nul file integrate properly with cancelation
* replace all calls to SleepEx to parking_sleep.sleep instead, making
them properly cancelable. These sleeps are workarounds for Windows
kernel bugs. Now you can even cancel while waiting for kernel bug
workarounds!
this gets the build runner compiling again on linux
this work is incomplete; it only moves code around so that environment
variables can be wrangled properly. a future commit will need to audit
the cancelation and error handling of this moved logic.
This was missed when updating to the new group cancelation API, and
caused illegal behavior in many cases (the condition was simply that a
DNS query returned a second result before a connection was successfully
established).
This commit includes some API changes which I agreed with Andrew as a
follow-up to the recent `Io.Group` changes:
* `Io.Group.await` *does* propagate cancelation to group tasks; it then
waits for them to complete, and *also* returns `error.Canceled`. The
assertion that group tasks handle `error.Canceled` "correctly" means
this behavior is loosely analagous to how awaiting a future works. The
important thing is that the semantics of `Group.await` and
`Future.await` are similar, and `error.Canceled` will always be
visible to the caller (assuming correct API usage).
* `Io.Group.awaitUncancelable` is removed.
* `Future.await` calls `recancel` only if the "child" task (the future
being awaited) did not acknowledge cancelation. If it did, then it is
assumed that the future will propagate `error.Canceled` through
`await` as needed.
As of this branch, the performance impact of robust cancelation is now
negligible (and in fact entirely unmeasurable in almost all cases), so
there is no good reason to not enable it in all cases. The performance
issues before were primarily down to a typo in the robust cancelation
logic which resulted in every canceled syscall potentially being sent
hundreds of signals in quick succession, because the delay between
signals started out at 1ns instead of 1us!
The most interesting thing here is the replacement of the pthread futex
implementation with an implementation based on thread park/unpark APIs.
Thread parking tends to be the primitive provided by systems which do
not have a futex primitive, such as NetBSD, so this implementation is
far more efficient than the pthread one. It is also useful on Windows,
where `RtlWaitOnAddress` is itself a userland implementation based on
thread park/unpark; we can implement it ourselves including support for
features which Windows' implementation lacks, such as cancelation and
waking a number of waiters with 1<n<infinity.
Compared to the pthread implementation, this thread-parking-based one
also supports full robust cancelation. Thread parking also turns out to
be useful for implementing `sleep`, so is now used for that on Windows
and NetBSD.
This commit also introduces proper cancelation support for most Windows
operations. The most notable omission right now is DNS lookups through
`GetAddrInfoEx`, just because they're a little more work due to having
a unique cancelation mechanism---but the machinery is all there, so I'll
finish gluing it together soon.
As of this commit, there are very few parts of `Io.Threaded` which do
not support full robust cancelation. The only ones which actually really
matter (because they could block for a prolonged period of time) are DNS
lookups on Windows (as discussed above) and futex waits on WASM.
The goal of this internal refactor is to fix some bugs in cancelation
and allow group tasks to clean up their own resources eagerly. The
latter will become a guarantee of the `std.Io` interface, which is
important so that groups can be used to "detach" tasks.
This commit changes the API which POSIX system calls use internally (the
functions formerly called `beginSyscall` etc), but does not update the
usage sites yet.
Now, the return type of functions spawned with `Group.async` and
`Group.concurrent` may be anything that coerces to `Io.Cancelable!void`.
Before this commit, group tasks were the only exception to the rule
"error.Canceled should never be swallowed". Now, there is no exception,
and it is enforced with an assertion upon closure completion.
Finally, fixes a case of swallowing error.Canceled in the compiler,
solving a TODO.
There are three ways to handle `error.Canceled`. In order of most
common:
1. Propagate it
2. After receiving it, io.recancel() and then don't propagate it
3. Make it unreachable with io.swapCancelProtection()
Rename `wait` to `await` to be consistent with Future API. The
convention here is that this set of functionality goes together:
* async/concurrent
* await/cancel
Also rename Select `wait` to `await` for the same reason.
`Group.await` now can return `error.Canceled`. Furthermore,
`Group.await` does not auto-propagate cancelation. Instead, users should
follow the pattern of `defer group.cancel(io);` after initialization,
and doing `try group.await(io);` at the end of the success path.
Advanced logic can choose to do something other than this pattern in the
event of cancelation.
Additionally, fixes a bug in `std.Io.Threaded` future await, in which it
swallowed an `error.Canceled`. Now if a task is canceled while awaiting
a future, after propagating the cancel request, it also recancels,
meaning that the awaiting task will properly detect its own cancelation
at the next cancelation point.
Furthermore, fixes a bug in the compiler where `error.Canceled` was
being swallowed in `dispatchPrelinkWork`.
Finally, fixes std.crypto code that inappropriately used
`catch unreachable` in response to cancelation without even so much as a
comment explaining why it was believed to be unreachable. Now, those
functions have `error.Canceled` in the error set and propagate
cancelation properly.
With this way of doing things, `Group.await` has a nice property: even if
all tasks in the group are CPU bound and without cancelation points, the
`Group.await` can still be canceled. In such case, the task that was
waiting for `await` wakes up with a chance to do some more resource
cleanup tasks, such as canceling more things, before entering the
deferred `Group.cancel` call at which point it has to suspend until the
canceled but uninterruptible CPU bound tasks complete.
closes#30601