[TOC] # Debugging C++ exception I don't claim to be an expert using command line debuggers (gdb, lldb), but they are useful for finding where code hits exceptions or segfaults. [debug-exception.cc](debug-exception.cc) is a simple test code that has functions foo1, foo2, foo3, foo4. Passing in 1 will throw in foo1, 2 will throw in foo2, etc. Just for fun it uses OpenMP so there are multiple threads, too, but that doesn't really change anything. This uses lldb on macOS; gdb has analogous functionality but different syntax. I added ### comments. You may have to play around with the syntax. I figured this out from https://stackoverflow.com/questions/8122375/lldb-breakpoint-on-exceptions-equivalent-of-gdbs-catch-throw but there were several different syntaxes given that didn't work for me – maybe for different versions of lldb? Although I also agree that we should be throwing exceptions that have more useful information in them. That CUDA code may have originated before we added the `slate_cuda_call`, but it should be updated. ```sh test/c++> make debug-exception g++ -Wall -pedantic -std=c++11 -fopenmp -c -o debug-exception.o debug-exception.cc g++ -fopenmp -o debug-exception debug-exception.o ### Run ./debug-exception 3, which throws in foo3. thyme test/c++> ./debug-exception 3 main( 3 ) foo4( 3 ) foo3( 3, tid 0 ) foo3( 3, tid 1 ) foo3( 3, tid 2 ) terminate called recursively terminate called recursively foo3( 3, tid 3 ) Abort ### Now run it in the debugger. thyme test/c++> lldb ./debug-exception (lldb) target create "./debug-exception" Current executable set to '/Users/mgates/Documents/test/c++/debug-exception' (x86_64). ### Set breakpoint on throwing C++ exceptions. (lldb) break set -n __cxa_throw Breakpoint 1: 2 locations. ### Run ./debug-exception 0, which doesn't throw an exception. (lldb) run 0 Process 91619 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64) main( 0 ) foo4( 0 ) foo3( 0, tid 1 ) foo2( 0, tid 1 ) foo1( 0, tid 1 ) foo3( 0, tid 1 ) foo2( 0, tid 1 ) foo1( 0, tid 1 ) foo3( 0, tid 0 ) foo2( 0, tid 0 ) foo1( 0, tid 0 ) foo3( 0, tid 0 ) foo3( 0, tid 3 ) foo3( 0, tid 2 ) foo2( 0, tid 2 ) foo1( 0, tid 2 ) foo2( 0, tid 3 ) foo1( 0, tid 3 ) foo3( 0, tid 3 ) foo2( 0, tid 3 ) foo3( 0, tid 2 ) foo2( 0, tid 0 ) foo2( 0, tid 2 ) foo1( 0, tid 3 ) foo1( 0, tid 2 ) foo3( 0, tid 1 ) foo2( 0, tid 1 ) foo1( 0, tid 1 ) foo1( 0, tid 0 ) foo3( 0, tid 0 ) foo2( 0, tid 0 ) foo1( 0, tid 0 ) Process 91619 exited with status = 0 (0x00000000) ### Run ./debug-exception 2, which throws an exception in foo2. (lldb) run 2 Process 91625 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64) main( 2 ) foo4( 2 ) foo3( 2, tid 0 ) foo2( 2, tid 0 ) foo3( 2, tid 2 ) foo2( 2, tid 2 ) foo3( 2, tid 1 ) Process 91625 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw libstdc++.6.dylib`__cxa_throw: -> 0x100122260 <+0>: pushq %r13 0x100122262 <+2>: movq %rdx, %r13 0x100122265 <+5>: pushq %r12 0x100122267 <+7>: movq %rsi, %r12 thread #3, stop reason = breakpoint 1.1 frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw libstdc++.6.dylib`__cxa_throw: -> 0x100122260 <+0>: pushq %r13 0x100122262 <+2>: movq %rdx, %r13 0x100122265 <+5>: pushq %r12 0x100122267 <+7>: movq %rsi, %r12 Target 0: (debug-exception) stopped. ### Looking at the backtrace (bt), we see it was in foo2. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw frame #1: 0x0000000100003ac2 debug-exception`foo2(int, int) + 104 frame #2: 0x0000000100003b4f debug-exception`foo3(int, int) + 119 frame #3: 0x0000000100003d0a debug-exception`foo4(int) (._omp_fn.0) + 100 frame #4: 0x0000000100452bd2 libgomp.1.dylib`GOMP_parallel + 66 frame #5: 0x0000000100003bd9 debug-exception`foo4(int) + 131 frame #6: 0x0000000100003c3a debug-exception`main + 90 frame #7: 0x00007fff6ce1bcc9 libdyld.dylib`start + 1 ### Run ./debug-exception 3, which throws an exception in foo3. (lldb) kill Process 91625 exited with status = 9 (0x00000009) (lldb) run 3 Process 91633 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64) main( 3 ) foo4( 3 ) foo3( 3, tid 0 ) foo3( 3, tid 1 ) foo3( 3, tid 2 ) Process 91633 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw libstdc++.6.dylib`__cxa_throw: -> 0x100122260 <+0>: pushq %r13 0x100122262 <+2>: movq %rdx, %r13 0x100122265 <+5>: pushq %r12 0x100122267 <+7>: movq %rsi, %r12 thread #2, stop reason = breakpoint 1.1 frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw libstdc++.6.dylib`__cxa_throw: -> 0x100122260 <+0>: pushq %r13 0x100122262 <+2>: movq %rdx, %r13 0x100122265 <+5>: pushq %r12 0x100122267 <+7>: movq %rsi, %r12 Target 0: (debug-exception) stopped. ### Looking at the backtrace (bt), we see it was in foo3. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw frame #1: 0x0000000100003b40 debug-exception`foo3(int, int) + 104 frame #2: 0x0000000100003d0a debug-exception`foo4(int) (._omp_fn.0) + 100 frame #3: 0x0000000100452bd2 libgomp.1.dylib`GOMP_parallel + 66 frame #4: 0x0000000100003bd9 debug-exception`foo4(int) + 131 frame #5: 0x0000000100003c3a debug-exception`main + 90 frame #6: 0x00007fff6ce1bcc9 libdyld.dylib`start + 1 (lldb) kill Process 91633 exited with status = 9 (0x00000009) (lldb) ^D ``` # Debugging MPI -------------------------------------------------------------------------------- ## Recompile SLATE with debugging It needs `-g` flag, and for at least test/test.o using `-O0`. Here's my make.inc file on leconte showing the additions: ``` slate> cat make.inc CXX = mpicxx FC = mpif90 #################### Added these lines #################### CXXFLAGS = -g -Wno-unused-variable # This is in SLATE's GNUmakefile since https://bitbucket.org/icl/slate-dev/pull-requests/137 # For `tester --debug` purposes, compile test.o with -O0 (after -O3). test/test.o: CXXFLAGS += -O0 #################### #################### # BLAS can be mkl or openblas (or others on other systems). Choose one. blas = mkl #blas = openblas # Intel MKL supports gfortran conventions and ifort conventions. # Choose one to match mpif90 compiler. blas_fortran = gfortran #blas_fortran = ifort # Intel MKL supports Open MPI and Intel MPI. # Choose one to match MPI library. #mkl_blacs = openmpi mkl_blacs = intelmpi cuda_arch = volta gpu_backend = cuda ``` For instance, when I compile test.o, the command is: ``` mpicxx -g -Wno-unused-variable -O3 -std=c++17 \ -Wall -Wshadow -pedantic -MMD -fPIC -fopenmp \ -DSLATE_WITH_MKL -DSLATE_NO_HIP \ -I./blaspp/include -I./lapackpp/include -I./include -I./src \ -O0 -I./testsweeper -c test/test.cc -o test/test.o ``` where the later `-O0` overrides the earlier `-O3`. -------------------------------------------------------------------------------- ## Run tester with MPI Here's an example that is failing. I added `Tile A00 = A( 0, 0 );` in src/internal/internal_gemm.cc, which fails on ranks where A( 0, 0 ) doesn't exist. ``` slate/test> mpirun -np 4 ./tester gemm SLATE version 2022.05.00, id 483bde4a input: ./tester gemm 2022-06-02 14:38:26, MPI size 4, OpenMP threads 20, GPU devices available 8 type origin target m ... error time (s) ... status d host task 100 ... 3.11e-16 0.000441 ... pass d host task 200 ... 4.60e-16 0.00149 ... pass d host task 300 ... 2.92e-16 0.00304 ... pass terminate called after throwing an instance of 'std::out_of_range' what(): map::at ``` Adding the `--debug R` flag to the tester will cause rank R to wait for debugger to attach (here, R = 1). ``` slate/test> mpirun -np 4 ./tester --debug 1 gemm MPI rank 1, pid 71503 on leconte.icl.utk.edu ready for debugger (gdb/lldb) to attach. After attaching, step out to run() and set i=1, e.g.: lldb -p 71503 (lldb) break set -n __cxa_throw # break on C++ exception (lldb) thread step-out # repeat (lldb) expr i=1 (lldb) continue ``` Rank 1 waits here for a debugger to attach. Once a debugger attaches and continues execution (see below), the tester will keep going. ``` SLATE version 2022.05.00, id 483bde4a input: ./tester --debug 1 gemm 2022-06-02 14:41:31, MPI size 4, OpenMP threads 20, GPU devices available 8 type origin target m ... error time (s) ... status d host task 100 ... 3.14e-16 0.000483 ... pass d host task 200 ... 4.48e-16 0.00147 ... pass d host task 300 ... 2.83e-16 0.00319 ... pass terminate called after throwing an instance of 'std::out_of_range' what(): map::at =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 71502 RUNNING AT leconte.icl.utk.edu = KILLED BY SIGNAL: 9 (Killed) =================================================================================== ``` -------------------------------------------------------------------------------- ## lldb session Run `lldb` or `gdb` debugger in a separate terminal, attaching to the tester process per instructions that SLATE's tester printed (above). ``` > lldb -p 71503 Process 71503 stopped * thread #1, name = 'tester', stop reason = signal SIGSTOP frame #0: 0x00007f340f8c89fd libc.so.6`__nanosleep + 45 libc.so.6`__nanosleep: -> 0x7f340f8c89fd <+45>: movq (%rsp), %rdi 0x7f340f8c8a01 <+49>: movq %rax, %rdx 0x7f340f8c8a04 <+52>: callq 0x7f340f90f890 ; __libc_disable_asynccancel 0x7f340f8c8a09 <+57>: movq %rdx, %rax thread #2, name = 'cuda-EvtHandlr', stop reason = signal SIGSTOP frame #0: 0x00007f340f8f6ddd libc.so.6`poll + 45 libc.so.6`poll: -> 0x7f340f8f6ddd <+45>: movq (%rsp), %rdi 0x7f340f8f6de1 <+49>: movq %rax, %rdx 0x7f340f8f6de4 <+52>: callq 0x7f340f90f890 ; __libc_disable_asynccancel 0x7f340f8f6de9 <+57>: movq %rdx, %rax ``` Break on C++ exceptions: ``` (lldb) break set -n __cxa_throw Breakpoint 2: where = libstdc++.so.6`__cxxabiv1::__cxa_throw(void *, std::type_info *, void (*)(void *)) at eh_throw.cc:77:1, address = 0x00007f34103cdff0 ``` It's helpful to immediately do a backtrace and disassembly when breaking; here's a hook from an [lldb cheat sheet](https://www.nesono.com/sites/default/files/lldb%20cheat%20sheet.pdf). I've found sometimes other MPI ranks will cause the whole program to abort without having time to manually run debugger commands. ``` (lldb) target stop-hook add Enter your stop hook command(s). Type 'DONE' to end. > bt > disassemble --pc DONE Stop hook #1 added. ``` Initially the debugger will probably be stopped in some system sleep routine (`__nanosleep`). Use `thread step-out` a few times until it shows the SLATE tester source code with `while (0 == i)`. ``` (lldb) thread step-out * thread #1, name = 'tester', stop reason = step out * frame #0: 0x00007f340f8c8894 libc.so.6`sleep + 212 frame #1: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:651:22 frame #2: 0x000000000045db42 tester`main(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:764:21 frame #3: 0x00007f340f825555 libc.so.6`__libc_start_main + 245 frame #4: 0x0000000000459677 tester`_start + 41 libc.so.6`sleep: -> 0x7f340f8c8894 <+212>: movl %eax, %ebx 0x7f340f8c8896 <+214>: testl %ebx, %ebx 0x7f340f8c8898 <+216>: je 0x7f340f8c88c0 ; <+256> 0x7f340f8c889a <+218>: xorl %ebp, %ebp Process 71503 stopped * thread #1, name = 'tester', stop reason = step out frame #0: 0x00007f340f8c8894 libc.so.6`sleep + 212 libc.so.6`sleep: -> 0x7f340f8c8894 <+212>: movl %eax, %ebx 0x7f340f8c8896 <+214>: testl %ebx, %ebx 0x7f340f8c8898 <+216>: je 0x7f340f8c88c0 ; <+256> 0x7f340f8c889a <+218>: xorl %ebp, %ebp (lldb) thread step-out * thread #1, name = 'tester', stop reason = step out * frame #0: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:650:13 frame #1: 0x000000000045db42 tester`main(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:764:21 frame #2: 0x00007f340f825555 libc.so.6`__libc_start_main + 245 frame #3: 0x0000000000459677 tester`_start + 41 tester`run: -> 0x45d2c7 <+2711>: jmp 0x45d2ae ; <+2686> at test.cc:650:22 0x45d2c9 <+2713>: movl $0x44000000, %edi ; imm = 0x44000000 0x45d2ce <+2718>: callq 0x435230 ; symbol stub for: MPI_Barrier 0x45d2d3 <+2723>: movl %eax, -0x64(%rbp) Process 71503 stopped * thread #1, name = 'tester', stop reason = step out frame #0: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:650:13 647 "(lldb) continue\n", 648 mpi_rank, getpid(), hostname, getpid() ); 649 fflush( stdout ); -> 650 while (0 == i) 651 sleep(1); 652 } 653 slate_mpi_call( MPI_Barrier( MPI_COMM_WORLD ) ); ``` Setting `expr i=1` will break that while loop. If the debugger doesn't know the variable `i`, check that you compiled with `-g` and `-O0`. ``` (lldb) expr i=1 (volatile int) $1 = 1 ``` Continue running until a C++ exception or breakpoint occurs, or the program completes. Here it broke at a C++ exception which the back trace, in frame #5, shows occurred in slate::internal::gemm.cc line 76, which is indeed where the error was injected. ``` (lldb) continue Process 71503 resuming thread #11, name = 'tester', stop reason = breakpoint 1.1 2.1 frame #0: 0x00007f34103cdff0 libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x00007f3198000960, tinfo=0x00007f34106e9228, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1 frame #1: 0x00007f34103c5352 libstdc++.so.6`std::__throw_out_of_range(__s="map::at") at functexcept.cc:82:5 frame #2: 0x00000000004b3c4c tester`slate::BaseMatrix::operator()(long, long, int) at stl_map.h:541:24 frame #3: 0x00000000004b3c40 tester`slate::BaseMatrix::operator()(long, long, int) at MatrixStorage.hh:388 frame #4: 0x00000000004b3c40 tester`slate::BaseMatrix::operator(this=0x00007f330cb6edc0, i=0, j=0, device=-1)(long, long, int) at BaseMatrix.hh:1236 frame #5: 0x00007f34239b696e libslate.so`void slate::internal::gemm((null)=TargetType<(slate::Target)84> @ 0x00007f330cb6ece0, alpha=3.1415926535897931, A=0x00007f330cb6edc0, B=0x00007f330cb6ed40, beta=2.7182818284590451, C=0x00007ffe1f6cde40, layout=ColMajor, priority=0, queue_index=, opts=error: summary string parsing error)84>, double, slate::Matrix&, slate::Matrix&, double, slate::Matrix&, blas::Layout, int, long, std::map, std::allocator > > const&) at internal_gemm.cc:76:13 frame #6: 0x00007f34239b6f77 libslate.so`void slate::internal::gemm<(slate::Target)84, double>(alpha=, A=, B=, beta=, C=, layout=, priority=, queue_index=, opts=error: summary string parsing error) at internal_gemm.cc:52:9 frame #7: 0x00007f3423d4f3ad libslate.so`_ZN5slate5gemmCILNS_6TargetE84EdEEvT0_RNS_6MatrixIS2_EES5_S2_S5_RKSt3mapINS_6OptionENS_11OptionValueESt4lessIS7_ESaISt4pairIKS7_S8_EEE._omp_fn.4((null)=0x00007f330cb6edc0) at gemmC.cc:106:35 frame #8: 0x00007f340fdfe1f4 libgomp.so.1`gomp_barrier_handle_tasks(state=320) at task.c:1387:6 frame #9: 0x00007f340fe05818 libgomp.so.1`gomp_team_barrier_wait_end(bar=, state=320) at bar.c:116:4 frame #10: 0x00007f340fe02e32 libgomp.so.1`gomp_thread_start(xdata=) at team.c:124:4 frame #11: 0x00007f341bab7ea5 libpthread.so.0`start_thread + 197 frame #12: 0x00007f340f901b0d libc.so.6`__clone + 109 libstdc++.so.6`__cxxabiv1::__cxa_throw(void *, std::type_info *, void (*)(void *)): -> 0x7f34103cdff0 <+0>: pushq %r13 0x7f34103cdff2 <+2>: movq %rdx, %r13 0x7f34103cdff5 <+5>: pushq %r12 0x7f34103cdff7 <+7>: movq %rsi, %r12 Process 71503 stopped * thread #11, name = 'tester', stop reason = breakpoint 1.1 2.1 frame #0: 0x00007f34103cdff0 libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x00007f3198000960, tinfo=0x00007f34106e9228, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1 Process 71503 exited with status = -1 (0xffffffff) debugserver died with an exit status of 0x00000000 ``` ## lldb startup Initial commands can be put into a init.lldb file: ``` break set -n __cxa_throw target stop-hook add bt disassemble --pc DONE ``` that is sourced when running lldb: ``` lldb -s init.lldb -p 71503 ```