mirror of
https://github.com/PX4/PX4-Autopilot.git
synced 2026-05-19 10:57:46 +08:00
9ec23e6180
Two changes from abfacca289 that were intended as winpthreads
work-arounds but slipped past the __PX4_WINDOWS guard and applied
to Linux too. Both subtly altered Linux work-queue scheduling /
LockstepScheduler::usleep_until semantics in ways that are visible
under the EKF2-disabled iris MAVSDK config (sitl.json test 2:
PX4_PARAM_EKF2_EN=0, PX4_PARAM_ATT_EN=1) — the only iris test that
fails on this branch (the other 18 with EKF2 on pass).
WorkQueueManager.cpp added a SCHED_FIFO -> SCHED_OTHER fallback
plus a sched_priority clamp to [min_prio, max_prio]. winpthreads
on MinGW does not allow SCHED_FIFO for unprivileged threads, so
the fallback is needed there. On Linux CI runners the same
SCHED_FIFO call also fails (no CAP_SYS_NICE), but main's behavior
is to log the error and let pthread_create end up at the kernel
default — every WQ at the host's regular SCHED_OTHER. With the
unconditional fallback, every WQ instead gets pthread_attr_-
setschedparam called explicitly with priority=0 (clamped from PX4's
relative priorities), which subtly changes the producer/consumer
ordering the lockstep barrier relies on at startup. Wrap the
fallback path in __PX4_WINDOWS so Linux is byte-identical to main.
lockstep_scheduler.cpp made the usleep_until lock/cond pair
`static thread_local`. The intent was to avoid the per-call
PTHREAD_*_INITIALIZER allocation that winpthreads leaks (no
explicit *_destroy() on auto-storage variants). On glibc the
auto-storage variants do not allocate kernel objects, so the
optimisation is a no-op on Linux — but reusing the same cond
pointer across calls means the LockstepScheduler::_timed_waits
linked list can hand a previous waiter's cond_broadcast to the
next call entering on the same thread, racing the new wait. Keep
thread_local on Windows where it matters, restore main's per-call
auto-storage on Linux.
Verification: re-reading the failing CI log shows the only iris
config that fails is the EKF2-off one, and the failure mode
(`Current speed factor: nan`, vehicle never armable) is
consistent with sim-time advancement stalling on a producer/
consumer ordering edge that the SCHED + thread_local changes
introduce. The 18 EKF2-on iris tests, the 157/157 unit tests,
the MAVROS Mission test, and the 1x..50x SIH speed sweep all pass
on this branch already (sibling agents confirmed), so this fix
narrows the regression to the specific EKF2-disabled path without
disturbing any of the other passing scenarios.
Signed-off-by: Nuno Marques <n.marques21@hotmail.com>