Documentation: move debug pages to a separate top level page

Move debug related pages from Guides to a separate top level page.
This way all pages related to debugging will be in one place
which is more user friendly.

Related Github issue: https://github.com/apache/nuttx/issues/15667

Signed-off-by: raiden00pl <raiden00@railab.me>
This commit is contained in:
raiden00pl
2025-03-22 08:15:08 +01:00
committed by Xiang Xiao
parent a4777f153b
commit 9394962cf6
30 changed files with 22 additions and 14 deletions
+78
View File
@@ -0,0 +1,78 @@
=========
Core Dump
=========
Overview
========
.. image:: image/coredump-overview.png
How to use
-----------
1. Enable NuttX Core dump
Enable Kconfig
.. code-block:: console
CONFIG_COREDUMP=y /* Enable Coredump */
CONFIG_BOARD_COREDUMP_SYSLOG=y /* Enable Board Coredump, if exceptions and assertions occur, */
CONFIG_SYSTEM_COREDUMP=y /* Enable coredump in user command, which can capture the current
state of one or all threads when the system is running, the
output can be redirect to console or file */
CONFIG_BOARD_COREDUMP_COMPRESSION=y /* Default y, enable Coredump compression to
reduce the size of the original core image */
CONFIG_BOARD_COREDUMP_FULL=y /* Default y, save all task information */
2. Run Coredump on nsh (CONFIG_SYSTEM_COREDUMP=y)
Parameters of coredump tool
.. code-block:: console
$ coredump <pid> /* If pid is specified, coredump will only capture the thread with the
specified pid, otherwise all threads will be captured */
$ coredump <filename> /* If filename is specified, then coredump will be output to the specified
file by default, otherwise it will be redirect in stdout stream */
3. Capture coredump from stdout
Save the print of the red frame part in the figure as file
.. image:: image/coredump-hexdump.png
.. code-block:: console
$ cat elf.dump
[CPU0] [ 6] 5A5601013D03FF077F454C4601010100C0000304002800C00D003420036000070400053400200008200A4000000420030034C024200001D8092004E00200601A
...
[CPU0] [ 6] 401B018D37814720005A5601000800090100006000010000
4. Convert the dump file
If the core file is post-processed by lzf compress and hexdump stream, execute the coredump script (`tools/coredump.py
<https://github.com/apache/nuttx/blob/master/tools/coredump.py>`_) to convert hex to binary and lzf decompression, If the -o parameter is not added in commandline, the output of <original file name>.core will be automatically generated:
.. code-block:: console
$ ./nuttx/tools/coredump.py elf.dump
Core file conversion completed: elf.core
5. Analysis by gdb
After generating elf.core, combined with compiled nuttx.elf, you can view the call stack and related register information of all threads directly through gdb:
(NOTE: Toolchain version must be newer than 11.3)
.. code-block:: console
$ prebuilts/gcc/linux/arm/bin/arm-none-eabi-gdb -c elf.core nuttx
.. image:: image/coredump-gdb.png
+78
View File
@@ -0,0 +1,78 @@
======================================
Coresight - HW Assisted Tracing on ARM
======================================
Overview
--------
Coresight is an umbrella of technologies allowing for the debugging of ARM
based SoC. It includes solutions for JTAG and HW assisted tracing. This
document is concerned with the latter.
HW assisted tracing is becoming increasingly useful when dealing with systems
that have many SoCs and other components like GPU and DMA engines. Developers
can monitor the behavior of their software as it runs on the device, view
real-time data about its execution, and identify and debug issues quickly.
Coresight omponents are generally categorised as source, link and sinks.
The source devices generats a compressed stream representing the processor
instruction path based on tracing scenarios. The link devices are responsible
for transferring the stream from the source device to the sink device. The sink
devices serve as as endpoints to the coresight implementation, either storing
the compressed stream in a memory buffer or creating an interface to the
outside world where data can be transferred to a host without fear of filling
up the onboard coresight memory buffer.
refer to the following document for more details:
https://developer.arm.com/documentation/102520/latest/
Acronyms and Classification
---------------------------
Acronyms:
PTM:
Program Trace Macrocell
ETM:
Embedded Trace Macrocell
STM:
System trace Macrocell
ETB:
Embedded Trace Buffer
ITM:
Instrumentation Trace Macrocell
TPIU:
Trace Port Interface Unit
TMC-ETR:
Trace Memory Controller, configured as Embedded Trace Router
TMC-ETF:
Trace Memory Controller, configured as Embedded Trace FIFO
Classification:
Source:
ETM, STM, ITM
Link:
Funnel, replicator, TMC-ETF
Sinks:
ETB, TPIU, TMC-ETR
Framework and implementation
----------------------------
The coresight framework provides a central point to represent, configure and
manage coresight devices on a platform. Any coresight compliant device can
register with the framework for as long as they use the right APIs:
.. c:function:: int coresight_register(FAR struct coresight_dev_s *csdev, FAR const struct coresight_desc_s *desc);
.. c:function:: void coresight_unregister(FAR struct coresight_dev_s *csdev);
``struct coresight_desc *desc`` describes the type of current coresight device
and where it connects to. When all the coresight devices are registered,
devices throught the tracing stream path can be enablea by calling:
.. c:function:: int coresight_enable(FAR struct coresight_dev_s *srcdev, FAR struct coresight_dev_s *destdev);
The ``coresight_enable`` function will build the path through srcdev and
destdev according the ``struct coresight_desc *desc``.
@@ -0,0 +1,203 @@
=============================
Analyzing Cortex-M Hardfaults
=============================
.. epigraph::
> I have a build of PX4 (NuttX 6.29 with some patches) with new
> lpc43xx chip files on 4337 chip running from FLASH (master
> vanilla NuttX has no such problem). This gives me a hardfault
> below if I stress NSH console (UART2) with some big output.
>
> I read some threads but can't get a clue how to analyze the
> dump and where to look first:
>
> 1bXXX and 1aXXX addresses are FLASH. 100XXX addresses are RAM
.. code-block:: console
Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: hpwork
sp: 10001eb4
IRQ stack:
base: 10001f00
size: 000003fc
10001ea0: 1b02d961 1b03f07e 10001eb4 10005ed8 1a0312ab 1b03f600 000000b8 1b02d961
10001ec0: 00000010 10001f40 00000003 00000000 1a03721d 1a037209 1b02d93b 00000000
10001ee0: 1a0371f5 00000000 00000000 00000000 00000000 00000000 1a0314a5 10005d7c
sp: 10005e50
User stack:
base: 10005ed8
size: 00000f9c
10005e40: 00000000 00000000 00000000 1b02d587 10004900 00000000 005b8d7f 00000000
10005e60: 1a030f2e 00000000 00000000 00001388 00000000 00000005 10001994 00000000
10005e80: 00000000 00000000 00000000 1b02c359 00000000 00000000 00000000 004c4b40
10005ea0: 000002ff 00000000 00000000 1a030f2f 00000000 00000000 00000000 00000000
10005ec0: 00000000 1a030f41 00000000 1b02c2a5 00000000 00000000 ffffffff 00bdeb39
R0: ffffffff 00000000 00000016 00000000 00000000 00000000 00000000 00000000
R8: 100036d8 00000000 00000000 004c4b40 10001370 10005e50 1b02b20b 1b02d596
xPSR: 41000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9
This question was asked in the old Yahoo! Group for NuttX, before the
project joined the Apache Software Foundation. The old forum no longer
exists, but the thread has been archived at
`Narkive <https://nuttx.yahoogroups.narkive.com/QNbG3r5l/hardfault-help-analysing-where-to-start>`_
(third party external link).
Analyzing the Register Dump
===========================
First, in the register dump:
.. code-block:: console
R0: ffffffff 00000000 00000016 00000000 00000000 00000000 00000000 00000000
R8: 100036d8 00000000 00000000 004c4b40 10001370 10005e50 1b02b20b 1b02d596
xPSR: 41000000 BASEPRI: 00000000 CONTROL: 00000000
``R15`` is the PC at the time of the crash (``1b02d596``). In order to
see where this is, I do this:
.. code-block:: console
arm-none-eabi-objdump -d nuttx | vi -
Of course, you can use any editor you prefer. In any case, this will
provide a full assembly language listing of your FLASH content along
with complete symbolic information.
**TIP:** Not comfortable with ARM assembly language? Try the
``objdump --source`` (or just ``-S``) option. That will intermix the C
and the assembly language code so that you can see which C statements
the assembly language is implementing.
Once you have the FLASH image in the editor, it is then a simple thing
to do the search in order to find the instruction at ``1b02d596``. The
symbolic information will show you exactly which function the address
is in and also the context of the instruction that can be used to
associate it to the exact line of code in the original C source file.
You also have all of the register contents so it is pretty easy to see
what happened (assuming you have some basic knowledge of Thumb2
assembly language and the ARM EABI). But it is usually not so easy to
see why it happened.
The rest of the instructions apply to finding out why the fault
happened.
``R14`` often contains the return address to the caller of the
offending functions. Bit one is set in this return address, but ignore
that (I.e., use ``1b02b20a`` instead of ``1b02b20b``). Use the objdump
command above to see where that is.
Sometimes, however, ``R14`` is not the caller of the offending
function. If the offending functions calls some other function then
``R14`` will be overwritten. But no problem, it will also then have
pushed the return address on the stack where we can find it by
analyzing the stack dump.
Analyzing the Stack Dump
========================
The Task Stack
--------------
To go further back in the time, you have to analyze the stack. It is a
push down stack so older events are at higher stack addresses; the
most recent things that happened will be at lower stack addresses.
Analyzing the stack is done in basically the same way:
1. Start at the highest stack addresses (oldest) and work forward in
time (lower addresses)
2. Find interesting addresses,
3. Use ``arm-none-eabi-objdump`` to determine where those addresses
are in the code.
An interesting address has these properties:
1. It lies in FLASH in your architecture. In your case these are the
addresses that begin with ``0x1a`` and ``0x1b``. Other
architectures may have different FLASH addresses or even addresses
in RAM.
2. The interesting addresses are all odd for Cortex-M, that is, bit 0
will be set. This is because as the code progresses, the return
address (``R14``) will be pushed on the stack. All of the return
addresses will lie in FLASH and will be odd.
Even FLASH addresses in the stack dump usually are references to
``.rodata`` in FLASH but are sometimes of interest as well. Below are
examples of interesting addresses (in brackets):
.. code-block:: console
sp: 10005e50
User stack:
base: 10005ed8
size: 00000f9c
10005e40: 00000000 00000000 00000000 [1b02d587] 10004900 00000000 005b8d7f 00000000
10005e60: 1a030f2e 00000000 00000000 00001388 00000000 00000005 10001994 00000000
10005e80: 00000000 00000000 00000000 [1b02c359] 00000000 00000000 00000000 004c4b40
10005ea0: 000002ff 00000000 00000000 [1a030f2f] 00000000 00000000 00000000 00000000
10005ec0: 00000000 [1a030f41] 00000000 [1b02c2a5] 00000000 00000000 ffffffff 00bdeb39
That will give the full backtrace up to the point of the failure.
The Interrupt Stack
-------------------
Note that in some cases there are two stacks listed. The interrupt
stack will be present if (1) the interrupt stack is enabled, and (2)
you are in an interrupt handler at the time that the failure occurred:
.. code-block:: console
Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: hpwork
sp: 10001eb4
IRQ stack:
base: 10001f00
size: 000003fc
10001ea0: [1b02d961] 1b03f07e 10001eb4 10005ed8 1a0312ab 1b03f600 000000b8 [1b02d961]
10001ec0: 00000010 10001f40 00000003 00000000 [1a03721d] [1a037209] [1b02d93b] 00000000
10001ee0: [1a0371f5] 00000000 00000000 00000000 00000000 00000000 [1a0314a5] 10005d7c
(Interesting addresses again in brackets).
The interrupt stack is sometimes interesting, for example when the
interrupt was caused by logic operating at the interrupt level. In
this case, it is probably not so interesting since fault was probably
caused by normal task code and the interrupt stack probably just shows
the normal operation of the interrupt handling logic.
Full Stack Analysis
-------------------
What I have proposed here is just skimming through the stack, finding
and interpreting interesting addresses. Sometimes you need more
information and you need to analyze the stack in more detail. That is
also possible because every word on the stack is there because of an
explicit push instruction in the code (usually a push instruction on
Cortex-M or an stmdb instruction in other ARM architectures). This is
painstaking work but can also be done to provide a more detailed
answer to "what happened?"
Recovering State at the Time of the Hardfault
=============================================
Here is another tip from Mike Smith:
.. epigraph::
"... for systems like NuttX where catching hardfaults is difficult,
you can recover the faulting PC, LR and SP (by examining the
exception stack), then write these values back into the appropriate
processor registers (adjust the PC as necessary for the fault).
"This will put you back in the application code at the point at
which the fault occurred. Some local variables will show as having
invalid values (because at the time of the fault they were live in
registers and have been overwritten by the exception handler), but
the stack frame, function arguments etc. should all show correctly."
@@ -0,0 +1,84 @@
==============================
Debugging ELF Loadable Modules
==============================
.. warning::
Migrated from:
https://cwiki.apache.org/confluence/display/NUTTX/Debugging+ELF+Loadable+Modules
Debugging ELF modules loaded in memory can be tricky because the load address
in memory does not match the addresses in the ELF file. This challenge has long
existed for debugging uClinux programs and Linux kernel modules; the same
solution can be used with NuttX ELF files (and probably with NxFLAT modules as
well). Below is a summary of one way to approach this:
1. Get ELF Module Load Address
==============================
Put a change in ``nuttx/binfmt`` so that you print the address where the ELF
text was loaded into memory.
Turning on BINFMT debug (``CONFIG_DEBUG_BINFMT=y``) should give you the same
information, although it may also provide more output than you really want.
Alternatively, you could place a ``printf()`` at the beginning of your ``main()``
function so that your ELF module can print its own load address. For example,
the difference between the address of ``main()`` in your object file and the
address of ``main()`` at run time reveals the actual load address.
2. Make the ELF Module Wait for You
===================================
Insert an infinite loop in the ``main()`` routine of your ELF program. For
example:
.. code-block:: c
volatile bool waitforme;
int main (int arc, char **argv)
{
while (!waitforme);
...
When you start the ELF program, you will see where it was loaded in memory, and
the ELF program will remain stuck in the infinite loop. It will continue to
wait for ``waitforme`` to become true before proceeding.
3. Start the Debugger
=====================
Start the debugger, connect to the GDB server, and halt the program. If your
debugger is well-behaved, it should stop at the infinite loop in ``main()``.
4. Load Offset Symbols
======================
Load symbols using the offset where the ELF module was loaded:
.. code-block:: shell
(gdb) add-symbol-file <myprogram> <load-address>
Here, ``<myprogram>`` is your ELF file containing symbols, and
``<load-address>`` is the address where the program text was actually loaded (as
determined above). Single-step a couple of times and confirm that you are in the
infinite loop.
5. And Debug
============
Set ``waitforme`` to a non-zero value. Execution should exit the infinite loop,
and now you can debug the ELF program loaded into RAM in the usual way.
An Easier Way?
==============
There might be an alternative that allows you to step into the ELF module
without modifying the code to include the ``waitforme`` loop. You could place a
breakpoint on the OS function ``task_start()``. That function runs before your
ELF program starts, so you should be able to single-step from the OS code
directly into your loaded ELF application—no changes to the ELF application
required.
When you step into the application's ``main()``, you have the relocated address
of ``main()`` and can use that address (see step #1) to compute the load offset.
@@ -0,0 +1,111 @@
===================================================================
Debugging / flashing NuttX on ARM with hardware debugger (JTAG/SWD)
===================================================================
.. warning::
Migrated from:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=139629444
NOTE: If you experience the issues described on this page, you can enable the
configuration option below to resolve it.
.. code-block:: makefile
CONFIG_STM32_DISABLE_IDLE_SLEEP_DURING_DEBUG=y
What's the problem?
===================
On some architectures (like ARM Cortex-M3) Idle thread causes the core to stop
using WFI (Wait For Interrupt) assembly instruction. This effectively stops
clocking of the core, which is resumed only by some enabled interrupt. This
causes hardware debuggers to believe that they were disconnected from the
target, as they lose connection with the now stopped core. For example OpenOCD
shows errors like these the moment you start the target:
.. code-block:: console
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 100ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 300ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 700ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 1500ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 3100ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 6300ms
Error: jtag status contains invalid mode value - communication failure
Polling target failed, GDB will be halted. Polling again in 6300ms
This makes debugging the code impossible and flashing the chip is much harder -
you have to connect to the chip at the right moment (when it's not disabled
due to WFI) - the chances of doing that are inverse proportional to the load
of your system (if your chip spends 99% of time in Idle mode, you have 1%
chance of connecting and halting it).
Solution
========
Some ARM cores that support disabling of clocking after WFI instruction have
special configuration options to make debugging possible. One example is STM32
family - with it's ``DBGMCU->CR`` register it's possible to keep the core
clocked during power-down modes. If your chip supports such configuration you
should put it in some early stage of initialization, like in
``stm32_boardinitialize()`` function. The following code demonstrates the
change for STM32:
.. code-block:: c
uint32_t cr = getreg32(STM32_DBGMCU_CR);
cr |= DBGMCU_CR_STANDBY | DBGMCU_CR_STOP | DBGMCU_CR_SLEEP;
putreg32(cr, STM32_DBGMCU_CR);
If your chip doesn't provide such options there is no other way than not using
WFI instruction in up_idle() function.
It should be noted that such modification should be done only for development
stage, as keeping the core clocked during power-down modes contradicts the
major purpose of using them - reducing power usage.
In rare cases that you still have problems with connecting to the target
(especially after power cycle), you should try connecting and halting the chip
under reset (this is supported by new versions of OpenOCD), by holding the
reset button while starting OpenOCD or by configuring OpenOCD to do that for
you.
Work-around
-----------
If you keep the RESET button pressed and run OpenOCD command to connected to
it, then it will connect sucessful. After connecting you need to keep the
reset button pressed until you open the telnet connection
(telnet 127.0.0.1 4444) and execute "reset halt":
.. code-block:: console
> reset halt
timed out while waiting for target halted
TARGET: stm32f1x.cpu - Not halted
in procedure 'reset'
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x080003d0 msp: 0x20001278
Then release the RESET boot and it will reset correctly.
This work-around was tested on viewtool-stm32f107 board and bypassed the above
error reported by OpenOCD. The SWD programmer was a STLink-V2 and this was
the command to connect:
.. code-block:: console
openocd -f interface/stlink-v2.cfg -f target/stm32f1x_stlink.cfg
The OpenOCD version used was: Open On-Chip Debugger 0.8.0-dev-00307-g215c41c
(git commit 215c41c)
@@ -0,0 +1,15 @@
=========================================
Disabling the Stack Dump During Debugging
=========================================
.. warning::
Migrated from:
https://cwiki.apache.org/confluence/display/NUTTX/Disabling+the+Stack+Dump+During+Debugging
The stack dump routine can clutter the output of GDB during debugging.
To disable it, set this configuration option in the defconfig file of
the board configuration:
.. code-block:: c
CONFIG_ARCH_STACKDUMP=n
+44
View File
@@ -0,0 +1,44 @@
=======
irqinfo
=======
``irqinfo`` is a custom GDB command that prints information about the IRQs in the system.
The output includes the IRQ number, the number of times the IRQ has been triggered,
the total time spent in the IRQ handler, the rate of the IRQ, the IRQ handler function,
and the handler's argument.
The argument is displayed as function if possible.
It's similar to nsh command ``irqinfo`` but works in GDB. See :ref:`cmdirqinfo` for more information.
The ``RATE`` column is not available.
.. tip::
To show the ``COUNT`` column, you need to enable the ``CONFIG_SCHED_IRQMONITOR`` option in the NuttX configuration.
Syntax
------
``irqinfo``
Example
-------
.. code-block:: bash
(gdb) irqinfo
IRQ COUNT TIME RATE HANDLER ARGUMENT
0 0 0 N/A mps_reserved 0x0 <sensor_unregister>
2 0 0 N/A mps_nmi 0x0 <sensor_unregister>
3 0 0 N/A arm_hardfault 0x0 <sensor_unregister>
4 0 0 N/A arm_memfault 0x0 <sensor_unregister>
5 0 0 N/A arm_busfault 0x0 <sensor_unregister>
6 0 0 N/A arm_usagefault 0x0 <sensor_unregister>
11 1 0 N/A arm_svcall 0x0 <sensor_unregister>
12 0 0 N/A arm_dbgmonitor 0x0 <up_debugpoint_remove>
14 0 0 N/A mps_pendsv 0x0 <up_debugpoint_remove>
15 6581421 0 N/A systick_interrupt 0x100010c <g_systick_lower>
49 2 0 N/A uart_cmsdk_tx_interrupt 0x1000010 <g_uart0port>
50 0 0 N/A uart_cmsdk_rx_interrupt 0x1000010 <g_uart0port>
59 2 0 N/A uart_cmsdk_ov_interrupt 0x1000010 <g_uart0port>
(gdb)
+152
View File
@@ -0,0 +1,152 @@
=========
gdbserver
=========
Introduction
============
This tool can utilize a crash log on a PC to simulate a set of GDB server functionalities,
enabling the use of GDB to debug the context of a NuttX crash.
The script directory is located in ``tools/gdbserver.py``.
Usage
=====
We can use ``-h`` to get help information:
.. code-block:: bash
$ usage: gdbserver.py [-h] -e ELFFILE [-l LOGFILE] [-a {arm,arm-a,arm-t,riscv,esp32s3,xtensa}] [-p PORT] [-g GDB] [-i [INIT_CMD]]
[-r [RAWFILE ...]] [-c [COREDUMP]] [--debug]
options:
-h, --help show this help message and exit
-e ELFFILE, --elffile ELFFILE
elffile
-l LOGFILE, --logfile LOGFILE
logfile
-a {arm,arm-a,arm-t,riscv,esp32s3,xtensa}, --arch {arm,arm-a,arm-t,riscv,esp32s3,xtensa}
Only use if can't be learnt from ELFFILE.
-p PORT, --port PORT gdbport
-g GDB, --gdb GDB provided a custom GDB path, automatically start GDB session and exit gdbserver when exit GDB.
-i [INIT_CMD], --init-cmd [INIT_CMD]
provided a custom GDB init command, automatically start GDB sessions and input what you provide. if you don't
provide any command, it will use default command [-ex 'bt full' -ex 'info reg' -ex 'display /40i $pc-40'].
-r [RAWFILE ...], --rawfile [RAWFILE ...]
rawfile is a binary file, args format like ram.bin:0x10000 ...
-c [COREDUMP], --coredump [COREDUMP]
coredump file, will prase memory in this file
--debug if enabled, it will show more logs.
Log Example
===========
1. Use ./tools/configure.sh esp32s3-devkit:nsh and disable `CONFIG_NSH_DISABLE_MW`.
2. `make -j`
3. Flash image to esp32s3-devkit.
4. Run `minicom -D /dev/ttyUSB0 -b 115200` and reset esp32s3-devkit.
5. Use `mw -1` on nsh to trigger a crash.
6. Get the crash log from minicom and save it to `crash.log`.
.. code-block:: bash
up_dump_register: PC: 42009cd8 PS: 00060820
up_dump_register: A0: 82007d71 A1: 3fc8b6d0 A2: 3fc8b8e0 A3: 00000000
up_dump_register: A4: ffffffff A5: 00000000 A6: 00000001 A7: 00000000
up_dump_register: A8: ffffffff A9: 3fc8b690 A10: ffffffff A11: 00000000
up_dump_register: A12: 0000002d A13: 0000002d A14: 3fc8bb6d A15: 0fffffff
up_dump_register: SAR: 00000000 CAUSE: 0000001c VADDR: ffffffff
up_dump_register: LBEG: 40055499 LEND: 400554a9 LCNT: fffffffc
dump_stack: User Stack:
dump_stack: base: 0x3fc8b0e0
dump_stack: size: 00002000
dump_stack: sp: 0x3fc8b6d0
stack_dump: 0x3fc8b6c0: 82007770 3fc8b700 3fc8b8e0 00000002 ffffffff 3fc89f54 00060e20 00000000
stack_dump: 0x3fc8b6e0: 3fc8b8e0 3fc8b778 00000000 3fc8b750 82007850 3fc8b720 3fc8b8e0 00000002
stack_dump: 0x3fc8b700: 3fc8b720 42009c84 3fc8bb68 3fc8b8e0 82006b04 3fc8b7d0 3fc8b8e0 3fc8bb68
stack_dump: 0x3fc8b720: 3fc8bb68 3fc8bb6b 00000000 00000000 00000000 00000000 00000000 00000000
stack_dump: 0x3fc8b740: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
stack_dump: 0x3fc8b760: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fc8bb69
stack_dump: 0x3fc8b780: 82006ad5 00000000 00000000 00000040 00000040 3fc8bb6e 3fc8adf8 0000002c
stack_dump: 0x3fc8b7a0: ffffffff 00000005 00000000 00000000 3fc8bae0 00000000 00000000 00000000
stack_dump: 0x3fc8b7c0: 820068a2 3fc8b800 3fc8b8e0 3c020837 00000001 3fc8b800 3fc8b8e0 3c020837
stack_dump: 0x3fc8b7e0: 0000000a 3fc8bae0 00000001 3fc8bb68 82006865 3fc8b820 00000001 3fc8b0c0
stack_dump: 0x3fc8b800: 00000001 3fc8bb68 00000000 3fc8ae1c 82003618 3fc8b840 00000001 3fc8b0c0
stack_dump: 0x3fc8b820: 3fc8b8e0 00000000 00000000 00000000 820019dc 3fc8b870 42006834 00000001
stack_dump: 0x3fc8b840: 00000064 00000000 00000000 00000000 3c0225d8 3fc89590 00000000 3fc880cc
stack_dump: 0x3fc8b860: 00000000 3fc8b890 00000000 00000000 3fc8b0c0 00000002 00000000 3fc8ad98
stack_dump: 0x3fc8b880: 00000000 3fc8b8b0 00000000 00000000 00000000 00000000 00000000 00000000
7. Run `./tools/gdbserver.py -e nuttx -l crash.log -p 1234 -a esp32s3`
8. Run `xtensa-esp32s3-elf-gdb nuttx -ex "target remote 127.0.0.1:1234"`
.. code-block:: bash
GNU gdb (esp-gdb) 12.1_20221002
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=xtensa-esp-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nuttx...
Remote debugging using 127.0.0.1:1234
0x42009cd8 in cmd_mw (vtbl=0x3fc8b8e0, argc=<optimized out>, argv=<optimized out>) at nsh_dbgcmds.c:259
259 nsh_output(vtbl, " %p = 0x%08" PRIx32, ptr, *ptr);
(gdb) bt
#0 0x42009cd8 in cmd_mw (vtbl=0x3fc8b8e0, argc=<optimized out>, argv=<optimized out>) at nsh_dbgcmds.c:259
#1 0x42007d71 in nsh_command (vtbl=0x3fc8b8e0, argc=2, argv=0x3fc8b720) at nsh_command.c:1154
#2 0x42007770 in nsh_execute (oflags=<optimized out>, redirfile=0x0, argv=0x3fc8b720, argc=2, vtbl=0x3fc8b8e0)
at nsh_parse.c:845
#3 nsh_parse_command (vtbl=0x3fc8b8e0, cmdline=<optimized out>) at nsh_parse.c:2744
#4 0x42007850 in nsh_parse (vtbl=0x3fc8b8e0,
cmdline=0x3fc8bb68 <error: Cannot access memory at address 0x3fc8bb68>) at nsh_parse.c:2828
#5 0x42006b04 in nsh_session (pstate=0x3fc8b8e0, login=<optimized out>, argc=1, argv=<optimized out>)
at nsh_session.c:245
#6 0x420068a2 in nsh_consolemain (argc=1, argv=0x3fc8b0c0) at nsh_consolemain.c:71
#7 0x42006865 in nsh_main (argc=1, argv=0x3fc8b0c0) at nsh_main.c:74
#8 0x42003618 in nxtask_startup (entrypt=0x42006834 <nsh_main>, argc=1, argv=0x3fc8b0c0)
at sched/task_startup.c:70
#9 0x420019dc in nxtask_start () at task/task_start.c:134
(gdb)
Raw file Example
================
1. If you obtain the memory file from your board, you can also use gdbserver.py to reconstruct the scene.
The most common way to get the raw file is to use the dump memory command
in GDB to dump the memory and save it as a file.
2. Run `./tools/gdbserver.py -e nuttx -r rawfile:0x1000 -a arm`
3. Run gdb with target remote.
Coredump Example
================
1. If you have a coredump, you also can run `./tools/gdbserver.py -e nuttx -c coredump -a arm`
2. Run gdb with target remote.
The benefit of this approach is that in a multi-core AMP system,
a single coredump might contain memory information from other cores.
By analyzing this coredump along with the corresponding ELF files from
the other cores, you can reconstruct the crash site of those other cores.
Thread awarenes
===============
`gdbserver.py` implements thread debugging based on `g_pidhash`, `g_npidhash`,
and `g_tcbinfo` in NuttX. If the log, raw file, or coredump you provide can read these variables,
it means you can use thread-related commands in GDB, such as `info thread` or `thread`
How to add new architecture
===========================
The main objective is to establish the sequence of registers in GDB,
aligning the register names in the crash log with the order of registers in GDB.
This alignment will facilitate the creation of a new architecture's GDB server.
+48
View File
@@ -0,0 +1,48 @@
===============
GDB with Python
===============
Introduction
============
The NuttX kernel can be effectively debugged using GDB's Python extension.
Commonly used classes and utilities are implemented in the ``nuttx/tools/gdb/nuttxgdb`` directory.
Users can also create custom Python scripts tailored to their debugging needs to analyze and troubleshoot the NuttX kernel more efficiently.
Usage
=====
1. Compile NuttX with CONFIG_DEBUG_SYMBOLS=y enabled and change ``CONFIG_DEBUG_SYMBOLS_LEVEL`` to ``-g3``.
2. Use GDB to debug the NuttX ELF binary (on a real device, a simulator, or with a coredump).
3. Add the following argument to the GDB command line: ``-ix="nuttx/tools/pynuttx/gdbinit.py"``
4. GDB will automatically load the Python script, enabling the use of custom commands.
How to write a GDB python script
================================
Here is an article that introduces the fundamental principles of Python in GDB. Read it to gain a basic understanding.
`Automate Debugging with GDB Python API <https://interrupt.memfault.com/blog/automate-debugging-with-gdb-python-api>`_.
For more documentation on gdb python, please refer to the official documentation of GDB.
`GDB Python API <https://sourceware.org/gdb/current/onlinedocs/gdb.html/Python-API.html#Python-API>`_.
Requirements
============
To use GDB with Python, the following requirements must be met:
- Use GDB compiled with Python support, Python 3.8 or later
- Install required Python packages: ``pip install -r tools/pynuttx/requirements.txt``
- Compile NuttX with debug level 3: ``CONFIG_DEBUG_SYMBOLS_LEVEL="-g3"``
.. Warning::
The GDB Python API is not available in all versions of GDB. Make sure to use a version that supports Python.
.. Warning::
NuttX must be compile with ``CONFIG_DEBUG_SYMBOLS=y`` and ``CONFIG_DEBUG_SYMBOLS_LEVEL="-g3"`` to use GDB with Python.
.. toctree::
:caption: GDB Plugin Commands
:maxdepth: 1
gdb/irqinfo.rst
Binary file not shown.

After

Width:  |  Height:  |  Size: 172 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 205 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 517 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

+21
View File
@@ -0,0 +1,21 @@
=========
Debugging
=========
This page contains a collection of guides on how to debug problems with NuttX.
.. toctree::
gdbwithpython.rst
qemugdb.rst
gdbserver.rst
debugging_elf_loadable_modules.rst
tasktrace.rst
kasan.rst
coredump.rst
coresight.rst
stackcheck.rst
stackrecord.rst
disabling_stackdumpdebug.rst
debuggingflash_nuttxonarm.rst
cortexmhardfaults.rst
mte.rst
+145
View File
@@ -0,0 +1,145 @@
====================================
The Kernel Address Sanitizer (KASAN)
====================================
Overview
--------
Kernel Address Sanitizer (KASAN) is a dynamic memory safety error detector
designed to find out-of-bounds and use-after-free bugs.
The current version of NuttX has two modes:
1. Generic KASAN
2. Software Tag-Based KASAN
Generic KASAN, enabled with CONFIG_MM_KASAN_GENERIC, is the mode intended for
debugging, similar to linux user level ASan. This mode is supported on many CPU
architectures, but it has significant performance and memory overheads.
The current NuttX Generic KASAN can support memory out of bounds detection
allocated by the default NuttX heap allocator,which depends on CONFIG_MM_DEFAULT_MANAGER
or CONFIG_MM_TLSF_MANAGER, and detection of out of bounds with global variables.
Software Tag-Based KASAN or SW_TAGS KASAN, enabled with CONFIG_MM_KASAN_SW_TAGS,
can be used for both debugging, This mode is only supported for arm64,
but its moderate memory overhead allows using it for testing on
memory-restricted devices with real workloads.
Support
-------
Architectures
~~~~~~~~~~~~~
Generic KASAN is supported on x86_64, arm, arm64, riscv, xtensa and so on.
Software Tag-Based KASAN modes are supported only on arm64.
Usage
-----
To enable Generic KASAN, configure the kernel with::
CONFIG_MM_KASAN=y
CONFIG_MM_KASAN_INSTRUMENT_ALL=y
CONFIG_MM_KASAN_GENERIC=y
If you want to enable global variable out of bounds detection,
you can add configurations based on the above::
CONFIG_MM_KASAN_GLOBAL=y
To enable Software Tag-Based KASAN, configure the kernel with::
CONFIG_MM_KASAN=y
CONFIG_MM_KASAN_INSTRUMENT_ALL=y
CONFIG_MM_KASAN_SW_TAGS=y
Implementation details
----------------------
Generic KASAN:
Compile with param -fsanitize=kernel-address,
Compile-time instrumentation is used to insert memory access checks. Compiler
inserts function calls (``__asan_load*(addr)``, ``__asan_store*(addr)``) before
each memory access of size 1, 2, 4, 8, or 16. These functions check whether
memory accesses are valid or not by checking corresponding shadow memory.
It is slightly different from Linux.
On the one hand, in terms of the source of the shadow area;
NuttX's shadow area comes from the end of each heap. During heap initialization,
it is offset and a kasan region is shaped at the end.
Regions between multiple heaps are concatenated using a linked list.
Secondly, in order to save more memory consumption,
the implementation of NuttX adopts a bitmap detection method;
For example, in the case of a 32-bit machine,
if the NuttX heap allocator allocates four bytes of memory to it,
the kasan module will allocate a shadow area of one bit per unit of
memory group on a four byte basis. If the shadow area is 0,
the memory group can be accessed, otherwise 1 is inaccessible
Thirdly, the implementation of global variable out of bounds detection
for this NuttX is also different from Linux.
Due to the particularity of the shadow region, NuttX needs to construct kasan regions
separately for the data and bss segments where the global variable is located.
Before compiling, add the compile option '--param asan-globals=1'.
In this way, the compiler will store all global variable information in this special sections,
'.data..LASAN0', These two segments store information about all global variables
and can be parsed using the following structure::
struct kasan_global {
const void *beg; /* Address of the beginning of the global variable. */
size_t size; /* Size of the global variable. */
size_t size_with_redzone; /* Size of the variable + size of the redzone. 32 bytes aligned. */
const void *name;
const void *module_name; /* Name of the module where the global variable is declared. */
unsigned long has_dynamic_init; /* This is needed for C++. */
/* It will point to a location that stores the file row,
* column, and file name information of each global variable */
struct kasan_source_location *location;
char *odr_indicator;
};
In order to reduce the amount of data generated by the compiler occupying the already precious flash space.
NuttX's approach is to use multiple links to extract the global variable information in elf through scripts,
construct the region and shadow of the global variables according to the rules of kasan region,
form an array, and finally link it to the program. The program concatenates the array to kasan's region linked list.
The data generated by the compiler will be placed in a non-existent memory block.
After the compilation is completed, this segment will be deleted
and will not be copied to the bin file of the final burned board.
Software Tag-Based KASAN:
Software Tag-Based KASAN uses a software memory tagging approach to checking
access validity. It is currently only implemented for the arm64 architecture.
Software Tag-Based KASAN uses the Top Byte Ignore (TBI) feature of arm64 CPUs
to store a pointer tag in the top byte of kernel pointers. It uses shadow memory
to store memory tags associated with each heap allocated memory cell (therefore, it
dedicates 1/8 th of the kernel memory for shadow memory).
On each memory allocation, Software Tag-Based KASAN generates a random tag, tags
the allocated memory with this tag, and embeds the same tag into the returned
pointer.
Software Tag-Based KASAN uses compile-time instrumentation to insert checks
before each memory access. These checks make sure that the tag of the memory
that is being accessed is equal to the tag of the pointer that is used to access
this memory. In case of a tag mismatch, Software Tag-Based KASAN prints a bug
report.
For developers
--------------
Ignoring accesses
~~~~~~~~~~~~~~~~~
If you want the module you are writing to not be inserted by the compiler,
you can add the option 'CFLAGS += -fno-sanitize=kernel-address' to a single module.
If it is a file, you can write it this way,
special_file.o: CFLAGS = -fno-sanitize=kernel-address
+95
View File
@@ -0,0 +1,95 @@
====================================
ATM64 MTE extension
====================================
Introduction
------------
Arm v8.5 introduced the Arm Memory Tagging Extension (MTE),
a hardware implementation of tagged memory.
Basically, MTE tags every memory allocation/deallocation
with additional metadata. It assigns a tag to a memory location,
which can then be associated with a pointer that references
that memory location. At runtime, the CPU checks that the pointer
and metadata tags match with every load and store.
NX OS currently supports deploying MTE on ARM64 QEMU,
which is supported at the EL1 level of NX OS.
Principle
---------
The Arm Memory Tagging Extension implements lock and key access to memory.
Locks can be set on memory and keys provided during memory access. If the key matches
the lock, the access is permitted. If it does not match, an error is reported.
Memory locations are tagged by adding four bits of metadata to each 16 bytes
of physical memory. This is the Tag Granule. Tagging memory implements the lock.
Pointers, and therefore virtual addresses, are modified to contain the key.
In order to implement the key bits without requiring larger pointers MTE uses the Top Byte
Ignore (TBI) feature of the Armv8-A Architecture. When TBI is enabled, the top byte of
a virtual address is ignored when using it as an input for address translation. This allows the
top byte to store metadata. In MTE four bits of the top byte are used to provide the key
Architectural Details
---------------------
MTE adds instructions to the Armv8-A Architecture that are outlined below and grouped
into three different categories [6]:
Instructions for tag manipulation applicable to stack and heap tagging.
IRG
In order for the statistical basis of MTE to be valid, a source of random tags is required.
IRG is defined to provide this in hardware and insert such a tag into a register for use
by other instructions.
GMI
This instruction is for manipulating the excluded set of tags for use with the IRG instruction.
This is intended for cases where software uses specific tag values for special purposes
while retaining random tag behavior for normal allocations.
LDG, STG, and STZG
These instructions allow getting or setting tags in memory. They are intended for changing
tags in memory either without modifying the data or zeroing the data.
ST2G and STZ2G
These are denser alternatives to STG and STZG which operate on two granules of memory
when allocation size allows them to be used.
STGP
This instruction stores both tag and data to memory.
Instructions Intended for pointer arithmetic and stack tagging:
ADDG and SUBG
These are variants of the ADD and SUB instructions, intended for arithmetic on addresses.
They allow both the tag and address to be separately modified by an immediate value.
These instructions are intended for creating the addresses of objects on the stack.
SUBP(S)
This instruction provides a 56-bit subtract with optional flag setting which is required
for pointer arithmetic that ignores the tag in the top byte.
Instructions intended for system use:
LDGM, STGM, and STZGM
These are bulk tag manipulation instructions which are UNDEFINED at EL0. These are
intended for system software to manipulate tags for the purposes of initialization and
serialization. For example, they can be used to implement swapping of tagged memory
to a medium which is not tag-aware. The zeroing form can be used for efficient
initialization of memory.
Currently NX OS supports the execution of the above instructions,
such as irg, ldg, stg instructions.
Their test programs are stored in "apps/system/mte" to test whether the current system supports
Usage
-----
If you want to experience the MTE function of NX OS, you can refer to the following
To enable ARM64_MTE, configure the kernel with::
CONFIG_ARM64_MTE=y
Of course you can also run it with the existing configuration
boards/arm64/qemu/qemu-armv8a/configs/mte
+89
View File
@@ -0,0 +1,89 @@
.. include:: /substitutions.rst
.. _qemugdb:
=====================================
How to debug NuttX using QEMU and GDB
=====================================
This guide explains the steps needed to use QEMU and GDB to debug
an ARM board (lm3s6965-ek), but it could be modified to work with other
board or architecture supported by QEMU.
Start configuring and compiling the lm3s6965-ek board with qemu-flat profile.
Compiling
=========
#. Configure the lm3s6965-ek
There is a sample configuration to use lm3s6965-ek on QEMU.
Just use ``lm3s6965-ek:qemu-flat`` board profile for this purpose.
.. code-block:: console
$ cd nuttx
$ ./tools/configure.sh lm3s6965-ek:qemu-flat
#. Compile
.. code-block:: console
$ make -j
Start QEMU
==========
#. You need to start QEMU using the NuttX ELF file just create above:
.. code-block:: console
$ qemu-system-arm -M lm3s6965evb -device loader,file=nuttx -serial mon:stdio -nographic -s
Timer with period zero, disabling
ABCDF
telnetd [4:100]
NuttShell (NSH) NuttX-12.0.0
nsh>
Start GDB to connect to QEMU
============================
These steps show how to connect GDB to QEMU running NuttX:
.. code-block:: console
$ gdb-multiarch nuttx -ex "source tools/pynuttx/gdbinit.py" -ex "target remote 127.0.0.1:1234"
Reading symbols from nuttx...
Registering NuttX GDB commands from ~/nuttx/nuttx/tools/gdb/nuttxgdb
set pagination off
set python print-stack full
"handle SIGUSR1 "nostop" "pass" "noprint"
Load macro: ~/nuttx/nuttx/b73e7dbb3d3bbd6ff2eb9be4e5f01d5e.json
readelf took 0.1 seconds
Parse macro took 0.1 seconds
Cache macro info to ~/nuttx/nuttx/b73e7dbb3d3bbd6ff2eb9be4e5f01d5e.json
if use thread command, please don't use 'continue', use 'c' instead !!!
if use thread command, please don't use 'step', use 's' instead !!!
Build version: "86868a9e194-dirty Nov 26 2024 00:14:53"
Remote debugging using :1234
0x0000b78a in up_idle () at chip/common/tiva_idle.c:62
62 }
(gdb)
#. From (gdb) prompt you can run commands to inspect NuttX:
.. code-block:: console
(gdb) info threads
Id Thread Info Frame
*0 Thread 0x2000168c (Name: Idle_Task, State: Running, Priority: 0, Stack: 1008) 0xa45a up_idle() at chip/common/tiva_idle.c:62
1 Thread 0x20005270 (Name: hpwork, State: Waiting,Semaphore, Priority: 224, Stack: 1984) 0xa68c up_switch_context() at common/arm_switchcontext.c:95
2 Thread 0x20005e30 (Name: nsh_main, State: Waiting,Semaphore, Priority: 100, Stack: 2008) 0xa68c up_switch_context() at common/arm_switchcontext.c:95
3 Thread 0x20006d48 (Name: NTP_daemon, State: Waiting,Signal, Priority: 100, Stack: 1960) 0xa68c up_switch_context() at common/arm_switchcontext.c:95
4 Thread 0x20008b60 (Name: telnetd, State: Waiting,Semaphore, Priority: 100, Stack: 2016) 0xa68c up_switch_context() at common/arm_switchcontext.c:95
(gdb)
As you can see QEMU and GDB are powerful tools to debug NuttX without using external board or expensive debugging hardware.
+65
View File
@@ -0,0 +1,65 @@
====================================
Stack Overflow Check
====================================
Overview
--------
Currently NuttX supports three types of stack overflow detection:
1. Stack Overflow Software Check
2. Stack Overflow Hardware Check
3. Stack Canary Check
The software stack detection includes two implementation ideas:
1. Implemented by coloring the stack memory
2. Implemented by comparing the sp and sl registers
Support
-------
Software and hardware stack overflow detection implementation,
currently only implemented on ARM Cortex-M (32-bit) series chips
Stack Canary Check is available on all platforms
Stack Overflow Software Check
-----------------------------
1. Memory Coloring Implementation Principle
1. Before using the stack, Thread will refresh the stack area to 0xdeadbeef
2. When Thread is running, it will overwrite 0xdeadbeef
3. up_check_tcbstack() detects 0xdeadbeef to get the stack peak value
Usage:
Enable CONFIG_STACK_COLORATION
2. Compare sp and sl
When compiling the program, keep r10 and use r10 as stackbase::
'''
ARCHOPTIMIZATION += -finstrument-functions -ffixed-r10
Each function will automatically add the following when entering and exiting:
__cyg_profile_func_enter
__cyg_profile_func_exit
Usage:
Enable CONFIG_ARMV8M_STACKCHECK or CONFIG_ARMV7M_STACKCHECK
Stack Overflow Hardware Check
-----------------------------
1. Set MSPLIM PSPLIM when context switching
2. Each time sp is operated, the hardware automatically compares sp and PSPLIM. If sp is lower than PSPLIM, crash
Usage:
Enable CONFIG_ARMV8M_STACKCHECK_HARDWARE
Stack Canary Check
-----------------------------
1. Add a canary value to the stack
2. When the thread is running, the canary value is overwritten
3. When the thread is running, the canary value is compared with the original value
4. If the value is different, it means that the stack is overflowed
Usage:
Enable CONFIG_STACK_CANARIES
+73
View File
@@ -0,0 +1,73 @@
=========================
Run time stack statistics
=========================
Introduce
=========
When debugging code, it is often necessary to focus on how to track
the maximum stack usage of the calling functions in order to optimize
the code structure and reduce stack usage. This article will introduce
a method based on the running state to track the deepest call stack of all tasks.
Configuration
=============
.. code-block:: c
CONFIG_SCHED_STACKRECORD=32
CONFIG_ARCH_INSTRUMENT_ALL=y
```CONFIG_SCHED_STACKRECORD``` is used to record the maximum stack usage of all tasks
```CONFIG_ARCH_INSTRUMENT_ALL``` is used to instrument all code.
Please note that CONFIG_ARCH_INSTRUMENT_ALL is not necessary.
This configuration option will instrument all code,
but if you only want to instrument specific functions,
you can add '-finstrument-functions' to the corresponding makefile.
Example
=======
1. ```./tools/configure.sh esp32c3-devkit:stack```
2. ```make -j20```
3. flash the image to your board
.. code-block :: bash
nsh> cat /proc/1/stack
StackAlloc: 0x3fc8b5b0
StackBase: 0x3fc8b5e0
StackSize: 2000
MaxStackUsed:1344
Backtrace Size
0x42009198 32
0x42009200 48
0x420081a0 128
0x42008d18 64
0x4201da60 80
0x420199e0 80
0x42018c6c 48
0x420194f4 48
0x42017d30 32
0x4201634c 32
0x420163ac 48
0x42016408 32
0x420132c0 48
0x42010598 32
0x4200fd98 48
0x4200f5dc 80
0x4200f8e0 160
Implementation details
======================
The specific principle is based on the instrumentation of gcc.
In the TCB (Thread Control Block) of the corresponding task,
the maximum value of the stack pointer (sp) is recorded at the entry of each function.
If it is the maximum value, the backtrace is recorded.
Notice
======
Be cautious when using the CONFIG_ARCH_INSTRUMENT_ALL option:
1. It will instrument every function, which may have a risk of recursion.
2. It will also instrument entry functions, such as _start(). At this point,
the bss segment and data segment have not been initialized,
which may cause errors. The current implementation uses a magic number to avoid this,
but it performs poorly during hot start. The solution is to mark
the entry function with the noinstrument_function flag to prevent instrumentation.
+20
View File
@@ -0,0 +1,20 @@
==========
Task Trace
==========
Task Trace is the tool to collect the various events in the NuttX kernel and display the result graphically.
It can collect the following events.
- Task execution, termination, switching
- System call enter/leave
- Interrupt handler enter/leave
.. toctree::
:maxdepth: 1
:caption: Contents:
tasktraceuser.rst
tasktraceinternal.rst
.. image:: image/task-trace-overview.png
@@ -0,0 +1,96 @@
====================
Task Trace Internals
====================
Overview
========
.. image:: image/task-trace-internal.png
The Task Trace is constructed by the following functions.
NuttX kernel events collection
------------------------------
The kernel events are collected by ``sched_note_*()`` API calls embedded in NuttX kernel.
- For task switch events
- ``sched_note_start()``
- ``sched_note_stop()``
- ``sched_note_suspend()``
- ``sched_note_resume()``
- For system call events
- ``sched_note_syscall_enter()``
- ``sched_note_syscall_leave()``
- For interrupt event
- ``sched_note_irqhandler()``
Filter logic (``nuttx/sched/sched_note.c``)
-------------------------------------------
- The ``sched_note_*()`` APIs are implemented here.
- Filter the notes and pass them to noteram driver by ``sched_note_add()`` API.
Noteram device driver (``nuttx/drivers/note/noteram_driver.c``)
---------------------------------------------------------------
- Accumurate incoming note records into the buffer.
- Read the note records from the buffer by user requests.
- The notes are recorded in the binary format of ``struct note_*_s``.
- The detail function is described in :doc:`../components/drivers/character/note`.
Notectl device driver (``nuttx/drivers/note/notectl_driver.c``)
---------------------------------------------------------------
- ``/dev/notectl`` device driver.
- Control the filter logic in ``sched_note.c`` by calling note filter APIs.
- The detail function is described in :doc:`../components/drivers/character/note`.
"``trace``" Built-In Application (``apps/system/trace/trace.c``)
----------------------------------------------------------------
- ``trace`` Built-In Application to control the trace function interactively.
- Read binary note records from ``/dev/note`` and convert into the ftrace text format which is acceptable by `"Trace Compass" <https://www.eclipse.org/tracecompass/>`_.
- The command syntax is described in :doc:`tasktraceuser`.
Getting the system call events
==============================
To get the system call events, two different methods are used for FLAT build and PROTECTED/KERNEL build.
FLAT build
----------
In FLAT build, a system call is just a function call into the NuttX kernel.
.. image:: image/syscall-flat-before.png
To get the system call events, `wrapper function option <https://sourceware.org/binutils/docs/ld/Options.html#index-_002d_002dwrap_003dsymbol>`_ of the GNU Linker is used.
The mksyscall tool is fixed to generate the system call wrapper which call system call enter/leave hook.
The wrapper supersedes the system call function call of the NuttX binary by passing ``--wrap`` linker option to the build system.
The wrapper calls the system call hooks before and after calling the real system call function.
.. image:: image/syscall-flat-after.png
PROTECTED/KERNEL build
----------------------
Different to FLAT build, in PROTECTED and KERNEL build, a system call is issued by an user space is handled as the following steps.
#. System call issued by an application code is handled by the system call proxy (automatically generated by mksyscall).
#. System call proxy issues the supervisor call instruction to enter into the kernel space.
#. System call handler in the kernel space calls the system call stub (automatically generated by mksyscall).
#. System call stub calls the API implementation in the NuttX kernel.
.. image:: image/syscall-protected-before.png
To get the system call events, the mksyscall tool is fixed to generate the system call wrapper which supersedes the system call function call in the system call stub.
.. image:: image/syscall-protected-after.png
+359
View File
@@ -0,0 +1,359 @@
=====================
Task Trace User Guide
=====================
Installation
============
Install Trace Compass
---------------------
Task Trace uses the external tool `"Trace Compass" <https://www.eclipse.org/tracecompass/>`_ to display the trace result.
Download it from https://www.eclipse.org/tracecompass/ and install into the host environment.
After the installation, execute it and choose ``Tools`` -> ``add-ons`` menu, then select ``Install Extensions`` to install the extension named "Trace Compass ftrace (Incubation)".
NuttX kernel configuration
--------------------------
To enable the task trace function, the NuttX kernel configuration needs to be modified.
The following configurations must be enabled.
- ``CONFIG_SCHED_INSTRUMENTATION`` : Enables the feature of scheduler notes.
- ``CONFIG_SCHED_INSTRUMENTATION_FILTER`` : Enables the filter logic of the notes.
- ``CONFIG_SCHED_INSTRUMENTATION_SYSCALL`` : Enable system call instrumentation.
- ``CONFIG_SCHED_INSTRUMENTATION_IRQHANDLER`` : Enables IRQ instrumentation.
- ``CONFIG_DRIVERS_NOTE`` : Enables note driver support.
- ``CONFIG_DRIVERS_NOTERAM`` : Enables ``/dev/note`` in-memory buffering driver.
- ``CONFIG_DRIVERS_NOTECTL`` : Enables ``/dev/notectl`` filter control driver.
- ``CONFIG_SYSTEM_TRACE`` : Enables "``trace``" command
- ``CONFIG_SYSTEM_SYSTEM`` : Enables "``system``" command (required by :ref:`trace_cmd`)
The following configurations are configurable parameters for trace.
- ``CONFIG_SCHED_INSTRUMENTATION_FILTER_DEFAULT_MODE``
- Specify the default filter mode.
If the following bits are set, the corresponding instrumentations are enabled on boot.
- Bit 0 = Enable instrumentation
- Bit 1 = Enable syscall instrumentation
- Bit 2 = Enable IRQ instrumentation
- Bit 3 = Enable collecting syscall arguments
- ``CONFIG_DRIVERS_NOTE_TASKNAME_BUFSIZE``
- Specify the task name buffer size in bytes.
The buffer is used to hold the name of the task during instrumentation.
Trace dump can find and show a task name corresponding to given pid in the instrumentation data by using this buffer.
If 0 is specified, this feature is disabled and trace dump shows only the name of the newly created task.
- ``CONFIG_DRIVERS_NOTERAM_BUFSIZE``
- Specify the note buffer size in bytes.
Higher value can hold more note records, but consumes more kernel memory.
- ``CONFIG_DRIVERS_NOTERAM_DEFAULT_NOOVERWRITE``
- If enabled, stop overwriting old notes in the circular buffer when the buffer is full by default.
This is useful to keep instrumentation data of the beginning of a system boot.
- ``CONFIG_DRIVERS_NOTERAM_CRASH_DUMP``
- If enabled, it will dump the data in the noteram buffer after a system crash.
This function can help to view the behavior of the system before the crash
After the configuration, rebuild the NuttX kernel and application.
If the trace function is enabled, "``trace``" :doc:`../applications/nsh/builtin` will be available.
How to get trace data
=====================
The trace function can be controlled by "``trace``" command.
Quick Guide
-----------
Getting the trace
^^^^^^^^^^^^^^^^^
Trace is started by the following command.
.. code-block::
nsh> trace start
Trace is stopped by the following command.
.. code-block::
nsh> trace stop
If you want to get the trace while executing some command, the following command can be used.
.. code-block::
nsh> trace cmd <command> [<args>...]
Displaying the trace result
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The trace result is accumulated in the memory.
After getting the trace, the following command displays the accumulated trace data to the console.
.. code-block::
nsh> trace dump
This will get the trace results like the following:
.. code-block::
<noname>-1 [0] 7.640000000: sys_close()
<noname>-1 [0] 7.640000000: sys_close -> 0
<noname>-1 [0] 7.640000000: sys_sched_lock()
<noname>-1 [0] 7.640000000: sys_sched_lock -> 0
<noname>-1 [0] 7.640000000: sys_nxsched_get_stackinfo()
<noname>-1 [0] 7.640000000: sys_nxsched_get_stackinfo -> 0
<noname>-1 [0] 7.640000000: sys_sched_unlock()
<noname>-1 [0] 7.640000000: sys_sched_unlock -> 0
<noname>-1 [0] 7.640000000: sys_clock_nanosleep()
<noname>-1 [0] 7.640000000: sched_switch: prev_comm=<noname> prev_pid=1 prev_state=S ==> next_comm=<noname> next_pid=0
<noname>-0 [0] 7.640000000: irq_handler_entry: irq=11
<noname>-0 [0] 7.640000000: irq_handler_exit: irq=11
<noname>-0 [0] 7.640000000: irq_handler_entry: irq=15
<noname>-0 [0] 7.650000000: irq_handler_exit: irq=15
<noname>-0 [0] 7.650000000: irq_handler_entry: irq=15
:
By using the logging function of your terminal software, the trace result can be saved into the host environment and it can be used as the input for `"Trace Compass" <https://www.eclipse.org/tracecompass/>`_.
If the target has a storage, the trace result can be stored into the file by using the following command.
It also can be used as the input for "Trace Compass" by transferring the file in the target device to the host.
.. code-block::
nsh> trace dump <file name>
To display the trace result by `"Trace Compass" <https://www.eclipse.org/tracecompass/>`_, choose ``File`` -> ``Open Trace`` menu to specify the trace data file name.
.. image:: image/trace-compass-screenshot.png
Trace command description
=========================
.. _trace_start:
trace start
-----------
Start task tracing
**Command Syntax:**
.. code-block::
trace start [-c][<duration>]
- ``-c`` : Continue the previous trace.
The trace data is not cleared before starting new trace.
- ``<duration>`` : Specify the duration of the trace by seconds.
Task tracing is stopped after the specified period.
If not specified, the tracing continues until stopped by the command.
.. _trace_stop:
trace stop
----------
Stop task tracing
**Command Syntax:**
.. code-block::
trace stop
.. _trace_cmd:
trace cmd
---------
Get the trace while running the specified command.
After the termination of the command, task tracing is stopped.
To use this command, ``CONFIG_SYSTEM_SYSTEM`` needs to be enabled.
**Command Syntax:**
.. code-block::
trace cmd [-c] <command> [<args>...]
- ``-c`` : Continue the previous trace.
The trace data is not cleared before starting new trace.
- ``<command>`` : Specify the command to get the task trace.
- ``<args>`` : Arguments for the command.
**Example:**
.. code-block::
nsh> trace cmd sleep 1
.. _trace_dump:
trace dump
----------
Output the trace result.
If the task trace is running, it is stopped before the output.
**Command Syntax:**
.. code-block::
trace dump [-c][<filename>]
- ``-c`` : Not stop tracing before the output.
Because dumping trace itself is a task activity and new trace data is added while output, the dump will never stop.
- ``<filename>`` : Specify the filename to save the trace result.
If not specified, the trace result is displayed to console.
.. _trace_mode:
trace mode
----------
Set the task trace mode options.
The default value is given by the kernel configuration ``CONFIG_SCHED_INSTRUMENTATION_FILTER_DEFAULT_MODE``.
**Command Syntax:**
.. code-block::
trace mode [{+|-}{o|s|a|i}...]
- ``+o`` : Enable overwrite mode.
The trace buffer is a ring buffer and it can overwrite old data if no free space is available in the buffer.
Enables this behavior.
- ``-o`` : Disable overwrite mode.
The new trace data will be disposed when the buffer is full.
This is useful to keep the data of the beginning of the trace.
- ``+s`` : Enable system call trace.
It records the event of enter/leave system call which is issued by the application.
All system calls are recorded by default. ``trace syscall`` command can filter the system calls to be recorded.
- ``-s`` : Disable system call trace.
- ``+a`` : Enable recording the system call arguments.
It records the arguments passed to the issued system call to the trace data.
- ``-a`` : Disable recording the system call arguments.
- ``+i`` : Enable interrupt trace.
It records the event of enter/leave interrupt handler which occurred while tracing.
All IRQs are recorded by default. ``trace irq`` command can filter the IRQs to be recorded.
- ``-i`` : Disable interrupt trace.
If no command parameters are specified, display the current mode as the follows.
**Example:**
.. code-block::
nsh> trace mode
Task trace mode:
Trace : enabled
Overwrite : on (+o)
Syscall trace : on (+s)
Filtered Syscalls : 16
Syscall trace with args : on (+a)
IRQ trace : on (+i)
Filtered IRQs : 2
.. _trace_syscall:
trace syscall
-------------
Configure the filter of the system call trace.
**Command Syntax:**
.. code-block::
trace syscall [{+|-}<syscallname>...]
- ``+<syscallname>`` : Add the specified system call name to the filter.
The execution of the filtered system call is not recorded into the trace data.
- ``-<syscallname>`` : Remove the specified system call name from the filter.
Wildcard "``*``" can be used to specify the system call name.
For example, "``trace syscall +sem_*``" filters the system calls begin with "``sem_``", such as ``sem_post()``, ``sem_wait()``,...
If no command parameters are specified, display the current filter settings as the follows.
**Example:**
.. code-block:: console
nsh> trace syscall
Filtered Syscalls: 16
getpid
sem_destroy
sem_post
sem_timedwait
sem_trywait
sem_wait
mq_close
mq_getattr
mq_notify
mq_open
mq_receive
mq_send
mq_setattr
mq_timedreceive
mq_timedsend
mq_unlink
.. _trace_irq:
trace irq
---------
Configure the filter of the interrupt trace.
**Command Syntax:**
.. code-block::
trace irq [{+|-}<irqnum>...]
- ``+<irqnum>`` : Add the specified IRQ number to the filter.
The execution of the filtered IRQ handler is not recorded into the trace data.
- ``-<irqnum>`` : Remove the specified IRQ number from the filter.
Wildcard "``*``" can be used to specify all IRQs.
If no command parameters are specified, display the current filter settings as the follows.
**Example:**
.. code-block:: console
nsh> trace irq
Filtered IRQs: 2
11
15