mirror of
https://github.com/apache/nuttx.git
synced 2026-05-13 10:38:40 +08:00
12e8f92a28
In Jan-Feb 2026: NuttX CI hit a [record high usage of GitHub Runners](https://github.com/apache/nuttx/issues/17914), exceeding the limit enforced by ASF Infrastructure Team. We analysed the PRs and discovered that most GitHub Runners were wasted on __(1) Failure to Download the Build Dependencies__ for DTC Device Tree, OpenAMP Messaging, MicroADB Debugger, MCUBoot Bootloader, NimBLE Bluetooth, etc __(2) Resubmitting PR Commits__: - [Video: Analysing the Most Expensive PR](https://youtu.be/swFaxaTCEQg) - [Video: Second Most Expensive PR](https://youtu.be/uSpQkzBogEw) - [Video: Third Most Expensive PR](https://youtu.be/J7w1gyjwZ1w) - [Video: Most Expensive Apps PR](https://youtu.be/182h8cRpfvI) - [Spreadsheet: Most Expensive PRs](https://docs.google.com/spreadsheets/d/1HY7fIZzd_fs3QPyA0TX7vsYOjL86m1fNOf1Wls93luI/edit?gid=70515654#gid=70515654) Why would __Download Failures__ waste GitHub Runners? That's because Download Failures will terminate the Entire CI Build (across All CI Jobs), requiring a restart of the CI Build. And the CI Build isn't terminated immediately upon failure: NuttX CI waits for the CI Job to complete (e.g. arm-01), before terminating the CI Build. Which means that CI Builds can get terminated 2.5 hours into the CI Build, wasting 2.5 elapsed hours x [7.4 parallel processes](https://lupyuen.org/articles/ci3#live-metric-for-full-time-runners) of GitHub Runners. This PR proposes to __Retry the Build for Each CI Target__. NuttX CI shall rebuild each CI Target (e.g. `sim:nsh`), upon failure, up to 3 times (total 4 builds). Each rebuild will be attempted after a Randomised Delay with Exponential Backoff, initially set to 60 seconds, then 120 seconds, 240 seconds. The rebuilds will mitigate the effects of Intermittent Download Failures that occur in GitHub Actions. (And eliminate developer frustration) If the build fails after 3 retries: Subsequent CI Targets will __not be allowed to rebuild__ upon failure. This is to prevent cascading build failures from overloading GitHub Actions, and consuming too many GitHub Runners. Note that NuttX CI shall retry the build for __Any Kind of Build Failure__, including Download Failures, Compile Errors and Config Errors. We designed it simplistically due to our current constraints: (1) Lack of CI Expertise (2) NuttX CI is Mission Critical (3) Legacy CI Scripts are Highly Complex. To prevent Compile Errors and Config Errors: We expect NuttX Devs to [Build and Test PRs in Our Own Repos](https://github.com/apache/nuttx/issues/18568), before submitting to NuttX. What about __Resubmitting PR Commits__ and its wastage of GitHub Runners? We also require NuttX Devs to [Build and Test PRs in Our Own Repos](https://github.com/apache/nuttx/issues/18568), before resubmitting to NuttX. GitHub Runners will then be charged to the developer's quota, without affecting the GitHub Runners quota for Apache NuttX Project. We plan to [Kill All CI Jobs](https://youtu.be/182h8cRpfvI?si=MmAuwLISZPPMoqDq&t=1479) for PRs that have been switched to Draft Mode. We'll monitor this through the [NuttX Build Monitor](https://github.com/apache/nuttx/issues/18659). Modified Files: `tools/testbuild.sh`: We introduce a New Wrapper Function `retrytest` that will call the Existing Function `dotest`, to build the CI Target and retry on error. `Documentation/components/tools/testbuild.rst`: Updated the `testbuild.sh` doc with the Retry Logic. Signed-off-by: Lup Yuen Lee <luppy@appkaki.com>
85 lines
3.5 KiB
ReStructuredText
85 lines
3.5 KiB
ReStructuredText
================
|
|
``testbuild.sh``
|
|
================
|
|
|
|
This script automates building of a set of configurations. The intent is
|
|
simply to assure that the set of configurations build correctly. The -h
|
|
option shows the usage:
|
|
|
|
.. code:: console
|
|
|
|
$ ./testbuild.sh -h
|
|
USAGE: tools/testbuild.sh -h [-l|m|c|g|n] [-d] [-e <extraflags>] [-x] [-j <ncpus>] [-a <appsdir>] [-t <topdir>] [-p]
|
|
[-A] [-C] [-G] [-N] [-R] [-S] [--codechecker] <testlist-file>
|
|
|
|
Where:
|
|
-h will show this help test and terminate
|
|
-l|m|c|g|n selects Linux (l), macOS (m), Cygwin (c),
|
|
MSYS/MSYS2 (g) or Windows native (n). Default Linux
|
|
-d enables script debug output
|
|
-e pass extra c/c++ flags such as -Wno-cpp via make command line
|
|
-x exit on build failures
|
|
-j <ncpus> passed on to make. Default: No -j make option.
|
|
-a <appsdir> provides the relative path to the apps/ directory. Default ../apps
|
|
-t <topdir> provides the absolute path to top nuttx/ directory. Default ../nuttx
|
|
-p only print the list of configs without running any builds
|
|
-A store the build executable artifact in ARTIFACTDIR (defaults to ../buildartifacts)
|
|
-C Skip tree cleanness check.
|
|
-G Use "git clean -xfdq" instead of "make distclean" to clean the tree.
|
|
This option may speed up the builds. However, note that:
|
|
* This assumes that your trees are git based.
|
|
* This assumes that only nuttx and apps repos need to be cleaned.
|
|
* If the tree has files not managed by git, they will be removed
|
|
as well.
|
|
-N Use CMake with Ninja as the backend.
|
|
-R execute "run" script in the config directories if exists.
|
|
-S Adds the nxtmpdir folder for third-party packages.
|
|
--codechecker enables CodeChecker statically analyze the code.
|
|
<testlist-file> selects the list of configurations to test. No default
|
|
|
|
Your PATH variable must include the path to both the build tools and the
|
|
kconfig-frontends tools
|
|
|
|
This script needs two pieces of information:
|
|
|
|
1. A description of the platform that you are testing on. This description
|
|
is provided by the optional -l, -m, -c, -g and -n options.
|
|
|
|
2. A list of configurations to build. That list is provided by a test
|
|
list file. The final, non-optional parameter, <testlist-file>,
|
|
provides the path to that file.
|
|
|
|
The test list file is a sequence of build descriptions, one per line. One
|
|
build descriptions consists of two comma separated values. For example::
|
|
|
|
stm32f429i-disco:nsh
|
|
arduino-due:nsh
|
|
/arm
|
|
/risc-v
|
|
|
|
The first value is the usual configuration description of the form
|
|
``<board-name>:<configuration-name>`` or ``/<folder-name>`` and must correspond to a
|
|
configuration or folder in the nuttx/boards directory.
|
|
|
|
The second value is valid name for a toolchain configuration to use
|
|
when building the configuration. The set of valid toolchain
|
|
configuration names depends on the underlying architecture of the
|
|
configured board.
|
|
|
|
The prefix ``-`` can be used to skip a configuration::
|
|
|
|
-stm32f429i-disco/nsh
|
|
|
|
or skip a configuration on a specific host(e.g. Darwin)::
|
|
|
|
-Darwin,sim:rpserver
|
|
|
|
This script will rebuild each configuration, upon failure, up to 3 times.
|
|
Each rebuild will be attempted after a randomised delay with exponential
|
|
backoff, initially set to 60 seconds. The rebuilds will mitigate the
|
|
effects of intermittent download failures that occur in GitHub Actions.
|
|
|
|
If the build fails after 3 retries, subsequent configurations will not
|
|
be allowed to rebuild upon failure. This is to prevent cascading build
|
|
failures from overloading GitHub Actions.
|