Files
nuttx/tools/testbuild.sh
T
Lup Yuen Lee 12e8f92a28 CI: Retry build upon failure
In Jan-Feb 2026: NuttX CI hit a [record high usage of GitHub Runners](https://github.com/apache/nuttx/issues/17914), exceeding the limit enforced by ASF Infrastructure Team. We analysed the PRs and discovered that most GitHub Runners were wasted on __(1) Failure to Download the Build Dependencies__ for DTC Device Tree, OpenAMP Messaging, MicroADB Debugger, MCUBoot Bootloader, NimBLE Bluetooth, etc __(2) Resubmitting PR Commits__:

- [Video: Analysing the Most Expensive PR](https://youtu.be/swFaxaTCEQg)
- [Video: Second Most Expensive PR](https://youtu.be/uSpQkzBogEw)
- [Video: Third Most Expensive PR](https://youtu.be/J7w1gyjwZ1w)
- [Video: Most Expensive Apps PR](https://youtu.be/182h8cRpfvI)
- [Spreadsheet: Most Expensive PRs](https://docs.google.com/spreadsheets/d/1HY7fIZzd_fs3QPyA0TX7vsYOjL86m1fNOf1Wls93luI/edit?gid=70515654#gid=70515654)

Why would __Download Failures__ waste GitHub Runners? That's because Download Failures will terminate the Entire CI Build (across All CI Jobs), requiring a restart of the CI Build. And the CI Build isn't terminated immediately upon failure: NuttX CI waits for the CI Job to complete (e.g. arm-01), before terminating the CI Build. Which means that CI Builds can get terminated 2.5 hours into the CI Build, wasting 2.5 elapsed hours x [7.4 parallel processes](https://lupyuen.org/articles/ci3#live-metric-for-full-time-runners) of GitHub Runners.

This PR proposes to __Retry the Build for Each CI Target__. NuttX CI shall rebuild each CI Target (e.g. `sim:nsh`), upon failure, up to 3 times (total 4 builds). Each rebuild will be attempted after a Randomised Delay with Exponential
Backoff, initially set to 60 seconds, then 120 seconds, 240 seconds. The rebuilds will mitigate the effects of Intermittent Download Failures that occur in GitHub Actions. (And eliminate developer frustration)

If the build fails after 3 retries: Subsequent CI Targets will __not be allowed to rebuild__ upon failure. This is to prevent cascading build failures from overloading GitHub Actions, and consuming too many GitHub Runners.

Note that NuttX CI shall retry the build for __Any Kind of Build Failure__, including Download Failures, Compile Errors and Config Errors. We designed it simplistically due to our current constraints: (1) Lack of CI Expertise (2) NuttX CI is Mission Critical (3) Legacy CI Scripts are Highly Complex. To prevent Compile Errors and Config Errors: We expect NuttX Devs to [Build and Test PRs in Our Own Repos](https://github.com/apache/nuttx/issues/18568), before submitting to NuttX.

What about __Resubmitting PR Commits__ and its wastage of GitHub Runners? We also require NuttX Devs to [Build and Test PRs in Our Own Repos](https://github.com/apache/nuttx/issues/18568), before resubmitting to NuttX. GitHub Runners will then be charged to the developer's quota, without affecting the GitHub Runners quota for Apache NuttX Project. We plan to [Kill All CI Jobs](https://youtu.be/182h8cRpfvI?si=MmAuwLISZPPMoqDq&t=1479) for PRs that have been switched to Draft Mode. We'll monitor this through the [NuttX Build Monitor](https://github.com/apache/nuttx/issues/18659).

Modified Files:

`tools/testbuild.sh`: We introduce a New Wrapper Function `retrytest` that will call the Existing Function `dotest`, to build the CI Target and retry on error.

`Documentation/components/tools/testbuild.rst`: Updated the `testbuild.sh` doc with the Retry Logic.

Signed-off-by: Lup Yuen Lee <luppy@appkaki.com>
2026-04-15 12:30:17 +02:00

645 lines
15 KiB
Bash
Executable File

#!/usr/bin/env bash
# tools/testbuild.sh
#
# SPDX-License-Identifier: Apache-2.0
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership. The
# ASF licenses this file to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the
# License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
WD=$(cd $(dirname $0) && cd .. && pwd)
nuttx=$WD/../nuttx
progname=$0
fail=0
maxbuilds=4 # Retry 3 times on failure
APPSDIR=$WD/../apps
if [ -z $ARTIFACTDIR ]; then
ARTIFACTDIR=$WD/../buildartifacts
fi
MAKE_FLAGS=-k
EXTRA_FLAGS="EXTRAFLAGS="
MAKE=make
unset testfile
unset HOPTION
unset STORE
unset JOPTION
PRINTLISTONLY=0
GITCLEAN=0
SAVEARTIFACTS=0
CHECKCLEAN=1
CODECHECKER=0
NINJACMAKE=0
RUN=0
case $(uname -s) in
Darwin*)
HOST=Darwin
;;
CYGWIN*)
HOST=Cygwin
;;
MINGW32*)
HOST=MinGw
;;
MSYS*)
HOST=Msys
;;
*)
# Assume linux as a fallback
HOST=Linux
;;
esac
function showusage {
echo ""
echo "USAGE: $progname -h [-l|m|c|g|n] [-d] [-e <extraflags>] [-x] [-j <ncpus>] [-a <appsdir>] [-t <topdir>] [-p]"
echo " [-A] [-C] [-G] [-N] [-R] [-S] [--codechecker] <testlist-file>"
echo ""
echo "Where:"
echo " -h will show this help test and terminate"
echo " -l|m|c|g|n selects Linux (l), macOS (m), Cygwin (c),"
echo " MSYS/MSYS2 (g) or Windows native (n). Default Linux"
echo " -d enables script debug output"
echo " -e pass extra c/c++ flags such as -Wno-cpp via make command line"
echo " -x exit on build failures"
echo " -j <ncpus> passed on to make. Default: No -j make option."
echo " -a <appsdir> provides the relative path to the apps/ directory. Default ../apps"
echo " -t <topdir> provides the absolute path to top nuttx/ directory. Default ../nuttx"
echo " -p only print the list of configs without running any builds"
echo " -A store the build executable artifact in ARTIFACTDIR (defaults to ../buildartifacts"
echo " -C Skip tree cleanness check."
echo " -G Use \"git clean -xfdq\" instead of \"make distclean\" to clean the tree."
echo " This option may speed up the builds. However, note that:"
echo " * This assumes that your trees are git based."
echo " * This assumes that only nuttx and apps repos need to be cleaned."
echo " * If the tree has files not managed by git, they will be removed"
echo " as well."
echo " -N Use CMake with Ninja as the backend."
echo " -R execute \"run\" script in the config directories if exists."
echo " -S Adds the nxtmpdir folder for third-party packages."
echo " --codechecker enables CodeChecker statically analyze the code."
echo " <testlist-file> selects the list of configurations to test. No default"
echo ""
echo "Your PATH variable must include the path to both the build tools and the"
echo "kconfig-frontends tools"
echo ""
exit 1
}
# Parse command line
while [ ! -z "$1" ]; do
case $1 in
-l | -m | -c | -g | -n )
HOPTION+=" $1"
;;
-d )
set -x
;;
-e )
shift
EXTRA_FLAGS+="$1"
;;
-x )
MAKE_FLAGS='--silent --no-print-directory'
set -e
;;
-a )
shift
APPSDIR="$1"
;;
-j )
shift
JOPTION="-j $1"
;;
-t )
shift
nuttx="$1"
;;
-p )
PRINTLISTONLY=1
;;
-G )
GITCLEAN=1
;;
-A )
SAVEARTIFACTS=1
;;
-C )
CHECKCLEAN=0
;;
-N )
NINJACMAKE=1
;;
-R )
RUN=1
;;
-S )
STORE+=" $1"
;;
--codechecker )
CODECHECKER=1
;;
-h )
showusage
;;
* )
testfile="$1"
shift
break
;;
esac
shift
done
if [ ! -z "$1" ]; then
echo "ERROR: Garbage at the end of line"
showusage
fi
if [ -z "$testfile" ]; then
echo "ERROR: Missing test list file"
showusage
fi
if [ ! -r "$testfile" ]; then
echo "ERROR: No readable file exists at $testfile"
showusage
fi
if [ ! -d "$nuttx" ]; then
echo "ERROR: Expected to find nuttx/ at $nuttx"
showusage
fi
if [ ! -d $APPSDIR ]; then
echo "ERROR: No directory found at $APPSDIR"
exit 1
fi
export APPSDIR
testlist=`grep -v -E "^(-|#)|^[C|c][M|m][A|a][K|k][E|e]" $testfile || true`
blacklist=`grep "^-" $testfile || true`
if [ ${NINJACMAKE} -eq 1 ]; then
cmakelist=`grep "^[C|c][M|m][A|a][K|k][E|e]" $testfile | cut -d',' -f2 || true`
fi
cd $nuttx || { echo "ERROR: failed to CD to $nuttx"; exit 1; }
function exportandimport {
# Do nothing until we finish to build the nuttx.
if [ ! -f nuttx ]; then
return $fail
fi
# If CONFIG_BUILD_KERNEL=y does not exist in .config, do nothing
if ! grep CONFIG_BUILD_KERNEL=y .config 1>/dev/null; then
return $fail
fi
if ! ${MAKE} export ${JOPTION} 1>/dev/null; then
fail=1
return $fail
fi
pushd ../apps/
if ! ./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz 1>/dev/null; then
fail=1
popd
return $fail
fi
if ! ${MAKE} import ${JOPTION} 1>/dev/null; then
fail=1
fi
popd
return $fail
}
function compressartifacts {
local target_path=$1
local target_name=$2
pushd $target_path >/dev/null
tar zcf ${target_name}.tar.gz ${target_name}
rm -rf ${target_name}
popd >/dev/null
}
function makefunc {
if ! ${MAKE} ${MAKE_FLAGS} "${EXTRA_FLAGS}" ${JOPTION} $@ 1>/dev/null; then
fail=1
else
exportandimport
fi
return $fail
}
function checkfunc {
build_cmd="${MAKE} ${MAKE_FLAGS} \"${EXTRA_FLAGS}\" ${JOPTION} 1>/dev/null"
local config_sub_path=$(echo "$config" | sed "s/:/\//")
local sub_target_name=${config_sub_path#$(dirname "${config_sub_path}")/}
local codechecker_dir=${ARTIFACTDIR}/codechecker_logs/${config_sub_path}
mkdir -p "${codechecker_dir}"
echo " Checking NuttX by Codechecker..."
CodeChecker check -b "${build_cmd}" -o "${codechecker_dir}/logs" -e sensitive --ctu
codecheck_ret=$?
echo " Storing analysis result to CodeChecker..."
echo " Generating HTML report..."
CodeChecker parse --export html --output "${codechecker_dir}/html" "${codechecker_dir}/logs" 1>/dev/null
echo " Compressing logs..."
compressartifacts "$(dirname "${codechecker_dir}")" "${sub_target_name}"
# If you need to stop CI, uncomment the following line.
# if [ $codecheck_ret -ne 0 ]; then
# fail=1
# fi
return $fail
}
# Clean up after the last build
function distclean {
echo " Cleaning..."
if [ -f .config ] || [ -f build/.config ]; then
if [ ${GITCLEAN} -eq 1 ] || [ ! -z ${cmake} ]; then
git -C $nuttx clean -xfdq
git -C $APPSDIR clean -xfdq
else
makefunc distclean
# Remove .version manually because this file is shipped with
# the release package and then distclean has to keep it.
rm -f .version
# Ensure nuttx and apps directory in clean state even with --ignored
if [ ${CHECKCLEAN} -ne 0 ]; then
if [ -d $nuttx/.git ] || [ -d $APPSDIR/.git ]; then
if [[ -n $(git -C $nuttx status --ignored -s) ]]; then
git -C $nuttx status --ignored
fail=1
fi
if [[ -n $(git -C $APPSDIR status --ignored -s) ]]; then
git -C $APPSDIR status --ignored
fail=1
fi
fi
fi
fi
fi
return $fail
}
# Configure for the next build
function configure_default {
if ! ./tools/configure.sh ${HOPTION} ${STORE} $config ${JOPTION} 1>/dev/null; then
fail=1
fi
if [ "X$toolchain" != "X" ]; then
setting=`grep _TOOLCHAIN_ $nuttx/.config | grep -v CONFIG_TOOLCHAIN_WINDOWS | grep -v CONFIG_ARCH_TOOLCHAIN_* | grep =y`
original_toolchain=`echo $setting | cut -d'=' -f1`
if [ ! -z "$original_toolchain" ]; then
echo " Disabling $original_toolchain"
kconfig-tweak --file $nuttx/.config -d $original_toolchain
fi
echo " Enabling $toolchain"
kconfig-tweak --file $nuttx/.config -e $toolchain
makefunc olddefconfig
fi
return $fail
}
function configure_cmake {
if ! cmake -B build -DBOARD_CONFIG=$config -GNinja 1>/dev/null; then
cmake -B build -DBOARD_CONFIG=$config -GNinja
fail=1
fi
if [ "X$toolchain" != "X" ]; then
setting=`grep _TOOLCHAIN_ $nuttx/build/.config | grep -v CONFIG_TOOLCHAIN_WINDOWS | grep -v CONFIG_ARCH_TOOLCHAIN_* | grep =y`
original_toolchain=`echo $setting | cut -d'=' -f1`
if [ ! -z "$original_toolchain" ]; then
echo " Disabling $original_toolchain"
kconfig-tweak --file $nuttx/build/.config -d $original_toolchain
fi
echo " Enabling $toolchain"
kconfig-tweak --file $nuttx/build/.config -e $toolchain
fi
return $fail
}
function configure {
echo " Configuring..."
if [ ! -z ${cmake} ]; then
configure_cmake
else
configure_default
fi
}
# Perform the next build
function build_default {
if [ "${CODECHECKER}" -eq 1 ]; then
checkfunc
else
makefunc
fi
if [ ${SAVEARTIFACTS} -eq 1 ]; then
artifactconfigdir=$ARTIFACTDIR/$(echo $config | sed "s/:/\//")/
mkdir -p $artifactconfigdir
xargs -I "{}" cp "{}" $artifactconfigdir < $nuttx/nuttx.manifest
fi
return $fail
}
function build_cmake {
if ! cmake --build build 1>/dev/null; then
cmake --build build
fail=1
fi
if [ ${SAVEARTIFACTS} -eq 1 ]; then
artifactconfigdir=$ARTIFACTDIR/$(echo $config | sed "s/:/\//")/
mkdir -p $artifactconfigdir
cd $nuttx/build
xargs -I "{}" cp "{}" $artifactconfigdir < $nuttx/build/nuttx.manifest
cd $nuttx
fi
return $fail
}
function build {
echo " Building NuttX..."
if [ ! -z ${cmake} ]; then
build_cmake
else
build_default
fi
}
function refresh_default {
# Ensure defconfig in the canonical form
if ! ./tools/refresh.sh --silent $config; then
fail=1
fi
# Ensure nuttx and apps directory in clean state
if [ ${CHECKCLEAN} -ne 0 ]; then
if [ -d $nuttx/.git ] || [ -d $APPSDIR/.git ]; then
if [[ -n $(git -C $nuttx status -s) ]]; then
git -C $nuttx status
fail=1
fi
if [[ -n $(git -C $APPSDIR status -s) ]]; then
git -C $APPSDIR status
fail=1
fi
fi
fi
return $fail
}
function refresh_cmake {
# Ensure defconfig in the canonical form
if [ "X$toolchain" != "X" ]; then
if [ ! -z "$original_toolchain" ]; then
kconfig-tweak --file $nuttx/build/.config -e $original_toolchain
fi
kconfig-tweak --file $nuttx/build/.config -d $toolchain
fi
if ! cmake --build build -t refreshsilent 1>/dev/null; then
cmake --build build -t refreshsilent
fail=1
fi
rm -rf build
# Ensure nuttx and apps directory in clean state
if [ ${CHECKCLEAN} -ne 0 ]; then
if [ -d $nuttx/.git ] || [ -d $APPSDIR/.git ]; then
if [[ -n $(git -C $nuttx status -s) ]]; then
git -C $nuttx status
fail=1
fi
if [[ -n $(git -C $APPSDIR status -s) ]]; then
git -C $APPSDIR status
fail=1
fi
fi
fi
# Use -f option twice to remove git sub-repository
git -C $nuttx clean -f -xfdq
git -C $APPSDIR clean -f -xfdq
return $fail
}
function refresh {
# Ensure defconfig in the canonical form
if [ ! -z ${cmake} ]; then
refresh_cmake
else
refresh_default
fi
}
function run {
if [ ${RUN} -ne 0 ]; then
run_script="$path/run.sh"
if [ -x $run_script ]; then
echo " Running NuttX..."
export ARTIFACTCONFDIR=$ARTIFACTDIR/$(echo $config | sed "s/:/\//")/
export CURRENTCONFDIR=$(realpath $path)
if ! $run_script; then
fail=1
fi
fi
fi
return $fail
}
# Coordinate the steps for the next build test
function dotest {
echo "===================================================================================="
config=`echo $1 | cut -d',' -f1`
check=${HOST},${config/\//:}
skip=0
for re in $blacklist; do
if [[ "${check}" =~ ${re:1}$ ]]; then
echo "Skipping: $1"
skip=1
break
fi
done
unset cmake
if [ ${NINJACMAKE} -eq 1 ]; then
for l in $cmakelist; do
if [[ "${config/\//:}" == "${l}" ]]; then
echo "Cmake in present: $1"
cmake=1
break
fi
done
fi
echo "Configuration/Tool: $1"
if [ ${PRINTLISTONLY} -eq 1 ]; then
return
fi
# Parse the next line
configdir=`echo $config | cut -s -d':' -f2`
if [ -z "${configdir}" ]; then
configdir=`echo $config | cut -s -d'/' -f2`
if [ -z "${configdir}" ]; then
echo "ERROR: Malformed configuration: ${config}"
showusage
else
boarddir=`echo $config | cut -d'/' -f1`
fi
else
boarddir=`echo $config | cut -d':' -f1`
fi
path=$nuttx/boards/*/*/$boarddir/configs/$configdir
if [ ! -r $path/defconfig ]; then
echo "ERROR: no configuration found at $path"
showusage
fi
unset toolchain
unset original_toolchain
if [ "X$config" != "X$1" ]; then
toolchain=`echo $1 | cut -d',' -f2`
if [ -z "$toolchain" ]; then
echo " Warning: no tool configuration"
fi
fi
# Perform the build test
echo $(date '+%Y-%m-%d %H:%M:%S')
echo "------------------------------------------------------------------------------------"
distclean
if [ ${skip} -ne 1 ]; then
configure
build
run
refresh
else
echo " Skipping: $1"
fi
}
# Build one entry from the test list file. Retry on failure.
function retrytest {
# Remember the Fail Status and clear it for each build
local line=$1
local prevfail=$fail
local backoff=60 # Initial Exponential Backoff, in seconds
# Build and retry on failure, with Random Exponential Backoff
for ((i = 1; i <= $maxbuilds; i++)); do
echo "Build Attempt $i of $maxbuilds"
fail=0
dotest $line
# Don't retry if the build succeeded
if [ ${fail} -eq 0 ]; then
break
else
# Build Failed: Clean up any corrupted downloads, don't reuse
git -C $nuttx clean -fd
git -C $APPSDIR clean -fd
pushd $nuttx ; git status ; popd
pushd $APPSDIR ; git status ; popd
fi
# If this is Final Retry: Don't retry subsequent builds
if [ $i -eq $maxbuilds ]; then
maxbuilds=1
break
fi
# Wait for Random Exponential Backoff, then retry
delay=$(( (RANDOM % $backoff) + 1 ))
echo "Wait $delay seconds ($backoff backoff)"
backoff=$(($backoff * 2))
sleep $delay
done
# Return the Previous Fail Status, unless this build failed
if [ ${fail} -eq 0 ]; then
fail=$prevfail
fi
}
# Perform the build test for each entry in the test list file
for line in $testlist; do
firstch=${line:0:1}
if [ "X$firstch" == "X/" ]; then
dir=`echo $line | cut -d',' -f1`
list=`find boards$dir -name defconfig | cut -d'/' -f4,6`
for i in ${list}; do
retrytest $i${line/"$dir"/}
done
else
retrytest $line
fi
done
echo "===================================================================================="
exit $fail