[core] Collapse duplicate to_remove_ read in Scheduler::call fast path

Read to_remove_ once at the top of the cleanup block instead of loading
it twice (once via cleanup_() -> to_remove_empty_(), once again for the
MAX_LOGICALLY_DELETED_ITEMS check). GCC cannot CSE across the memw
barriers that std::atomic<uint32_t>::load emits on Xtensa, so both the
fast-path zero check and the max check were generating independent
memw + l32i sequences:

    memw; l32i a8, [to_remove_]; beqz ...   ; to_remove_empty_()
    memw; l32i a8, [to_remove_]; bltui 5    ; to_remove_count_()

Reading the counter once into a register and branching on the result
collapses the common zero-case to a single memw + l32i + beqz. The
non-zero path pays one extra read after cleanup_slow_path_ (which may
have decremented the counter), but that path already takes lock_ so the
extra load is negligible.

Scheduler::call stays at 344 B (unchanged); no flash layout shift, so
adjacent code (feed_wdt_slow_ etc.) stays in the same cache lines and
the measurement can't be confounded by MMIO timing drift.

An earlier version of this change also marked Scheduler::millis_64()
ESPHOME_ALWAYS_INLINE to drop its out-of-line $isra$0 clone; that
grew Scheduler::call by 56 B, shifted feed_wdt_slow_ in flash, and
measurably regressed the wdt bucket on a busier winefridge.yaml
config, so the inlining is dropped. This change keeps only the memw
collapse.
This commit is contained in:
J. Nick Koston
2026-04-24 16:56:02 -05:00
parent f62972c2c6
commit 614d018e4b
+17 -7
View File
@@ -601,14 +601,24 @@ uint32_t HOT Scheduler::call(uint32_t now) {
}
#endif /* ESPHOME_DEBUG_SCHEDULER */
// Cleanup removed items before processing
// First try to clean items from the top of the heap (fast path)
this->cleanup_();
// Cleanup removed items before processing. Read the counter once so the
// common zero-case is a single atomic load + branch; the old code called
// cleanup_() (which loads to_remove_) and then to_remove_count_() again for
// the MAX check, producing a redundant memw+load pair on the fast path.
// GCC cannot CSE across the memw barriers that std::atomic<uint32_t>::load
// emits on Xtensa, so the duplicate load was unavoidable without manual
// restructuring.
if (this->to_remove_count_() != 0) {
// First try to clean items from the top of the heap (fast path).
this->cleanup_slow_path_();
// If we still have too many cancelled items, do a full cleanup
// This only happens if cancelled items are stuck in the middle/bottom of the heap
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS) {
this->full_cleanup_removed_items_();
// If we still have too many cancelled items, do a full cleanup.
// This only happens if cancelled items are stuck in the middle/bottom
// of the heap. Re-read to_remove_ because cleanup_slow_path_ may have
// decremented it.
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS) {
this->full_cleanup_removed_items_();
}
}
// IMPORTANT: This loop uses index-based access (items_[0]), NOT iterators.
// This is intentional — fired intervals are pushed back into items_ via