Segfault Core Dump For ROS
Introduction
In the ros wiki (accessed May 2023), it is said that the core file is saved in $ROS_HOME
but for the life of me I could not find it. In this post I will describe generation of the core dump, as well as describe running GDB with said core dump.
The results of this post was obtained on Ubuntu 20.04 with ROS Noetic.
Apport
Apport is the “core dumper” in Ubuntu. To check that this is indeed the case, one can run:
This will show the program the core dump is piped to, as well as define how the core file will be named. Core dump files are then stored in /var/lib/apport/coredump/
.
To get the core dump, one has to allow the core dump take up filespace. This is done using the oft quoted ulimit -c unlimited
. This needs to be done in every terminal that the core dump needs to be saved. To check that this is properly set, check that core file size
is unlimited
in ulimit -a
.
With core file size
set to unlimited for the particular terminal session, it is time to run the segfaulting program in that terminal. Once it has segfaulted, one can check Apport logs with:
This is helpful because it logs if the segfault is detected, as well as logging what was the offending program, and the name of the core dump file, if generated. These information are crucial for analysing the code dump file with GDB.
Generating Core Dump
Armed with knowledge on how and where to find the core dump file, it is time to analyse the core dumps with GDB. To do this I use the segfault-example-node I’ve created to quickly generate some segfaults.
First, without setting ulimit -c unlimited
, the logs (cat /var/log/apport.log
) may look like:
Whereas with ulimit -c unlimited
, the logs (cat /var/log/apport.log
) may look like:
While in both instances the segfault is detected, only the latter generated the core dump file. After the core file is found, run GDB with the executable and core file. The syntax is gdb <path_to_executable> <path_to_core_dump>
. In my example, in terminal I used:
Which produces:
Well, that is not much help. It tells me that the offending function is channelCB1
, but what is that function is 1000000 lines long? Well, notice that GDB reports that no debug symbols is found. If the same executable can be rebuilt with debug symbols, then the core dump file can be used to further step into what was the last line of code run.
Luckily, for my example, this can be done with catkin build ros_segfault_example -DCMAKE_BUILD_TYPE=Debug
Running the same GDB command with the same corefile as above yields:
Looks like line 24 is giving me problems. Typing info locals
in GDB gives:
I am truely caught with my pants down here. I was trying to access element 0 of a std::vector
of length 0 and capacity 0. Not the best look.
Apart from info locals
, other useful GDB commands to help in debugging include bt
(backtrace), frame
, and where
. Of course, not to forget, q
to quit GDB.
Finally, one may also run the program till the segfault directly in GDB, but this will be left as an exercise for the reader (hint start GDB with the executable, and then run
).
Conclusion
There you have it, generating and analysing the core dump file in ROS. Remember to set the core file size via ulimit
in each terminal you are running programs that you can core dumps to be generated on. While obvious, I reiterate that this necessarily includes computer restarts as well for good measure.
Comments or discussions may be posted here.