Introduction

In the ros wiki (accessed May 2023), it is said that the core file is saved in $ROS_HOME but for the life of me I could not find it. In this post I will describe generation of the core dump, as well as describe running GDB with said core dump.

The results of this post was obtained on Ubuntu 20.04 with ROS Noetic.

Apport

Apport is the “core dumper” in Ubuntu. To check that this is indeed the case, one can run:

cat /proc/sys/kernel/core_pattern

This will show the program the core dump is piped to, as well as define how the core file will be named. Core dump files are then stored in /var/lib/apport/coredump/.

To get the core dump, one has to allow the core dump take up filespace. This is done using the oft quoted ulimit -c unlimited. This needs to be done in every terminal that the core dump needs to be saved. To check that this is properly set, check that core file size is unlimited in ulimit -a.

With core file size set to unlimited for the particular terminal session, it is time to run the segfaulting program in that terminal. Once it has segfaulted, one can check Apport logs with:

cat /var/log/apport.log

This is helpful because it logs if the segfault is detected, as well as logging what was the offending program, and the name of the core dump file, if generated. These information are crucial for analysing the code dump file with GDB.

Generating Core Dump

Armed with knowledge on how and where to find the core dump file, it is time to analyse the core dumps with GDB. To do this I use the segfault-example-node I’ve created to quickly generate some segfaults.

First, without setting ulimit -c unlimited, the logs (cat /var/log/apport.log) may look like:

ERROR: apport (pid 5800) Tue May 23 16:00:49 2023: called for pid 5724, signal 11, core limit 0, dump mode 1
ERROR: apport (pid 5800) Tue May 23 16:00:49 2023: executable: /home/bronya/catkin_ws/devel/.private/ros_segfault_example/lib/ros_segfault_example/segfault_example_node (command line "/home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node")
ERROR: apport (pid 5800) Tue May 23 16:00:49 2023: executable does not belong to a package, ignoring

Whereas with ulimit -c unlimited, the logs (cat /var/log/apport.log) may look like:

ERROR: apport (pid 5835) Tue May 23 16:04:05 2023: called for pid 5810, signal 11, core limit 18446744073709551615, dump mode 1
ERROR: apport (pid 5835) Tue May 23 16:04:05 2023: ignoring implausibly big core limit, treating as unlimited
ERROR: apport (pid 5835) Tue May 23 16:04:05 2023: executable: /home/bronya/catkin_ws/devel/.private/ros_segfault_example/lib/ros_segfault_example/segfault_example_node (command line "/home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node")
ERROR: apport (pid 5835) Tue May 23 16:04:05 2023: executable does not belong to a package, ignoring
ERROR: apport (pid 5835) Tue May 23 16:04:05 2023: writing core dump to core._home_bronya_catkin_ws_devel__private_ros_segfault_example_lib_ros_segfault_example_segfault_example_node.1000.dc4d4c47-626c-4b71-b2fa-a73bee89fc05.5810.577534 (limit: -1)

While in both instances the segfault is detected, only the latter generated the core dump file. After the core file is found, run GDB with the executable and core file. The syntax is gdb <path_to_executable> <path_to_core_dump>. In my example, in terminal I used:

gdb /home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node /var/lib/apport/coredump/core._home_bronya_catkin_ws_devel__private_ros_segfault_example_lib_ros_segfault_example_segfault_example_node.1000.dc4d4c47-626c-4b71-b2fa-a73bee89fc05.5810.577534

Which produces:

Reading symbols from /home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node...
(No debugging symbols found in /home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node)

warning: core file may not match specified executable file.
[New LWP 5810]
[New LWP 5823]
[New LWP 5822]
[New LWP 5825]
[New LWP 5824]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005561ced02959 in channelCB1(boost::shared_ptr<std_msgs::String_<std::allocator<void> > const> const&) ()
[Current thread is 1 (Thread 0x7f3d1e8bf780 (LWP 5810))]

Well, that is not much help. It tells me that the offending function is channelCB1, but what is that function is 1000000 lines long? Well, notice that GDB reports that no debug symbols is found. If the same executable can be rebuilt with debug symbols, then the core dump file can be used to further step into what was the last line of code run.

Luckily, for my example, this can be done with catkin build ros_segfault_example -DCMAKE_BUILD_TYPE=Debug

Running the same GDB command with the same corefile as above yields:

Reading symbols from /home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node...

warning: core file may not match specified executable file.
[New LWP 5810]
[New LWP 5823]
[New LWP 5822]
[New LWP 5825]
[New LWP 5824]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/bronya/catkin_ws/devel/lib/ros_segfault_example/segfault_example_node'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005561ced02959 in channelCB1 (msg=...)
    at /home/bronya/catkin_ws/src/ros_segfault_example/src/segfault_example_node.cpp:24
24	    d = v1[i];
[Current thread is 1 (Thread 0x7f3d1e8bf780 (LWP 5810))]

Looks like line 24 is giving me problems. Typing info locals in GDB gives:

(gdb) info locals
i = 0
v1 = std::vector of length 0, capacity 0
d = 1

I am truely caught with my pants down here. I was trying to access element 0 of a std::vector of length 0 and capacity 0. Not the best look.

Apart from info locals, other useful GDB commands to help in debugging include bt (backtrace), frame, and where. Of course, not to forget, q to quit GDB.

Finally, one may also run the program till the segfault directly in GDB, but this will be left as an exercise for the reader (hint start GDB with the executable, and then run).

Conclusion

There you have it, generating and analysing the core dump file in ROS. Remember to set the core file size via ulimit in each terminal you are running programs that you can core dumps to be generated on. While obvious, I reiterate that this necessarily includes computer restarts as well for good measure.

Comments or discussions may be posted here.