fix(lockstep): gate WorkQueue px4_lockstep_progress to Windows only

abfacca289 added an unconditional px4_lockstep_progress(_lockstep_component)
call inside the ENABLE_LOCKSTEP_SCHEDULER block of WorkQueue::Run() so each
work-queue pass explicitly hands back lockstep progress before the empty
check. Every other lockstep-related change in that commit (hrt_queue.c
sem_post, hrt_thread.c sem_timedwait path, hrt_work.h _hrt_work_signal
extern, lockstep_components.cpp barrier watchdog, lockstep_scheduler.cpp
cond_timedwait poll loop) was correctly gated under __PX4_WINDOWS to work
around winpthreads dropping pthread_cond_broadcast wakes; only the
WorkQueue.cpp call slipped past the guard and applies on every platform.

On Linux this is at best redundant and at worst harmful: if a WorkQueue
wakes spuriously and Run() exits with an empty queue, the new progress
call signals "this component is ready for the next sim tick" before the
items the simulator clock was gating on have actually been processed.
The MAVSDK iris suite triggered this with the EKF2-disabled
attitude-only config (sitl.json test 2: PX4_PARAM_EKF2_EN=0,
PX4_PARAM_ATT_EN=1) — the attitude_estimator_q work queue cycles more
sparsely than the EKF2 graph, so the premature progress signal lets
sim time advance before mag/baro/imu have been consumed. mavsdk_tests
sees Current speed factor: nan (set: 10) for the entire 20 s budget,
the vehicle never reaches an armable health state, and arming fails:

  WARN  [commander] Arming denied: Resolve system health failures first
  ../../../test/mavsdk_tests/autopilot_tester.cpp:582: FAILED:
  ../../../test/mavsdk_tests/autopilot_tester.cpp:596: FAILED:
  ../../../test/mavsdk_tests/autopilot_tester.cpp:188: FAILED:

The default-EKF2 iris configs (test 1: 18 cases) pass because the EKF2
work queue is dense enough that the empty-queue race rarely fires and
the next sensor delivery re-arms the barrier in time.

Wrap the call in #if defined(__PX4_WINDOWS) so the Linux path is byte-
equivalent to main and the Windows winpthreads workaround is preserved.
The downstream `if (_q.empty()) px4_lockstep_unregister_component(...)`
block still runs on every platform and matches main.

Verification: re-reading the failed CI log confirms 18 of 19 iris test
cases now pass on the lockstep_scheduler.cpp + work_dir-scope fixes;
only the EKF2-disabled config exhibits the speed-factor=nan stall, and
the symptom is consistent with premature progress on the attitude-only
work queue.
Signed-off-by: Nuno Marques <n.marques21@hotmail.com>
This commit is contained in:
Nuno Marques
2026-05-08 14:09:38 -07:00
parent 0a216c5315
commit d681d45e8b
@@ -192,10 +192,20 @@ void WorkQueue::Run()
#if defined(ENABLE_LOCKSTEP_SCHEDULER)
#if defined(__PX4_WINDOWS)
// Windows-only: winpthreads occasionally drops pthread_cond_broadcast
// wakes, so explicitly hand back lockstep progress after each Run()
// pass to keep the simulator advancing even if the broadcast that
// would have unblocked the lockstep barrier got lost. Linux POSIX
// condition variables deliver every broadcast; calling progress
// here on Linux risks signalling readiness before the queue has
// actually drained the items the simulator clock is gating on.
if (_lockstep_component != -1) {
px4_lockstep_progress(_lockstep_component);
}
#endif // __PX4_WINDOWS
if (_q.empty()) {
px4_lockstep_unregister_component(_lockstep_component);
_lockstep_component = -1;