@@ -195,6 +195,11 @@ automatically debug the ``gdbserver`` process as it's created. However this
195
195
author has not been able to get either to work in this scenario so we suggest
196
196
making a more specific command wherever possible instead.
197
197
198
+ Another option is to let ``lldb-server `` start up, then attach to the process
199
+ that's interesting to you. It's less automated and won't work if the bug occurs
200
+ during startup. However it is a good way to know you've found the right one,
201
+ then you can take its command line and run that directly.
202
+
198
203
Output From ``lldb-server ``
199
204
***************************
200
205
@@ -258,3 +263,320 @@ then ``lldb B`` to trigger ``lldb-server B`` to go into that code and hit the
258
263
breakpoint. ``lldb-server A `` is only here to let us debug ``lldb-server B ``
259
264
remotely.
260
265
266
+ Debugging The Remote Protocol
267
+ -----------------------------
268
+
269
+ LLDB mostly follows the `GDB Remote Protocol <https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html >`_
270
+ . Where there are differences it tries to handle both LLDB and GDB behaviour.
271
+
272
+ LLDB does have extensions to the protocol which are documented in
273
+ `lldb-gdb-remote.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-gdb-remote.txt >`_
274
+ and `lldb/docs/lldb-platform-packets.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt >`_.
275
+
276
+ Logging Packets
277
+ ***************
278
+
279
+ If you just want to observe packets, you can enable the ``gdb-remote packets ``
280
+ log channel.
281
+
282
+ ::
283
+
284
+ (lldb) log enable gdb-remote packets
285
+ (lldb) run
286
+ lldb < 1> send packet: +
287
+ lldb history[1] tid=0x264bfd < 1> send packet: +
288
+ lldb < 19> send packet: $QStartNoAckMode#b0
289
+ lldb < 1> read packet: +
290
+
291
+ You can do this on the ``lldb-server `` end as well by passing the option
292
+ ``--log-channels "gdb-remote packets" ``. Then you'll see both sides of the
293
+ connection.
294
+
295
+ Some packets may be printed in a nicer way than others. For example XML packets
296
+ will print the literal XML, some binary packets may be decoded. Others will just
297
+ be printed unmodified. So do check what format you expect, a common one is hex
298
+ encoded bytes.
299
+
300
+ You can enable this logging even when you are connecting to an ``lldb-server ``
301
+ in platform mode, this protocol is used for that too.
302
+
303
+ Debugging Packet Exchanges
304
+ **************************
305
+
306
+ Say you want to make ``lldb `` send a packet to ``lldb-server ``, then debug
307
+ how the latter builds its response. Maybe even see how ``lldb `` handles it once
308
+ it's sent back.
309
+
310
+ That all takes time, so LLDB will likely time out and think the remote has gone
311
+ away. You can change the ``plugin.process.gdb-remote.packet-timeout `` setting
312
+ to prevent this.
313
+
314
+ Here's an example, first we'll start an ``lldb-server `` being debugged by
315
+ ``lldb ``. Placing a breakpoint on a packet handler we know will be hit once
316
+ another ``lldb `` connects.
317
+
318
+ ::
319
+
320
+ $ lldb -- lldb-server gdbserver :1234 -- /tmp/test.o
321
+ <...>
322
+ (lldb) b GDBRemoteCommunicationServerCommon::Handle_qSupported
323
+ Breakpoint 1: where = <...>
324
+ (lldb) run
325
+ <...>
326
+
327
+ Next we connect another ``lldb `` to this, with a timeout of 5 minutes:
328
+
329
+ ::
330
+
331
+ $ lldb /tmp/test.o
332
+ <...>
333
+ (lldb) settings set plugin.process.gdb-remote.packet-timeout 300
334
+ (lldb) gdb-remote 1234
335
+
336
+ Doing so triggers the breakpoint in ``lldb-server ``, bringing us back into
337
+ ``lldb ``. Now we've got 5 minutes to do whatever we need before LLDB decides
338
+ the connection has failed.
339
+
340
+ ::
341
+
342
+ * thread #1, name = 'lldb-server', stop reason = breakpoint 1.1
343
+ frame #0: 0x0000aaaaaacc6848 lldb-server<...>
344
+ lldb-server`lldb_private::process_gdb_remote::GDBRemoteCommunicationServerCommon::Handle_qSupported:
345
+ -> 0xaaaaaacc6848 <+0>: sub sp, sp, #0xc0
346
+ <...>
347
+ (lldb)
348
+
349
+ Once you're done simply ``continue `` the ``lldb-server ``. Back in the other
350
+ ``lldb ``, the connection process will continue as normal.
351
+
352
+ ::
353
+
354
+ Process 2510266 stopped
355
+ * thread #1, name = 'test.o', stop reason = signal SIGSTOP
356
+ frame #0: 0x0000fffff7fcd100 ld-2.31.so`_start
357
+ ld-2.31.so`_start:
358
+ -> 0xfffff7fcd100 <+0>: mov x0, sp
359
+ <...>
360
+ (lldb)
361
+
362
+ Reducing Bugs
363
+ -------------
364
+
365
+ This section covers reducing a bug that happens in LLDB itself, or where you
366
+ suspect that LLDB causes something else to behave abnormally.
367
+
368
+ Since bugs vary wildly, the advice here is general and incomplete. Let your
369
+ instincts guide you and don't feel the need to try everything before reporting
370
+ an issue or asking for help. This is simply inspiration.
371
+
372
+ Reduction
373
+ *********
374
+
375
+ The first step is to reduce uneeded compexity where it is cheap to do so. If
376
+ something is easily removed or frozen to a cerain value, do so. The goal is to
377
+ keep the failure mode the same, with fewer dependencies.
378
+
379
+ This includes, but is not limited to:
380
+
381
+ * Removing test cases that don't crash.
382
+ * Replacing dynamic lookups with constant values.
383
+ * Replace supporting functions with stubs that do nothing.
384
+ * Moving the test case to less unqiue system. If your machine has an exotic
385
+ extension, try it on a readily available commodity machine.
386
+ * Removing irrelevant parts of the test program.
387
+ * Reproducing the issue without using the LLDB test runner.
388
+ * Converting a remote debuging scenario into a local one.
389
+
390
+ Now we hopefully have a smaller reproducer than we started with. Next we need to
391
+ find out what components of the software stack might be failing.
392
+
393
+ Some examples are listed below with suggestions for how to investigate them.
394
+
395
+ * Debugger
396
+
397
+ * Use a `released version of LLDB <https://github.com/llvm/llvm-project/releases >`_.
398
+
399
+ * If on MacOS, try the system ``lldb ``.
400
+
401
+ * Try GDB or any other system debugger you might have e.g. Microsoft Visual
402
+ Studio.
403
+
404
+ * Kernel
405
+
406
+ * Start a virtual machine running a different version. ``qemu-system `` is
407
+ useful here.
408
+
409
+ * Try a different physical system running a different version.
410
+
411
+ * Remember that for most kernels, userspace crashing the kernel is always a
412
+ kernel bug. Even if the userspace program is doing something unconventional.
413
+ So it could be a bug in the application and the kernel.
414
+
415
+ * Compiler and compiler options
416
+
417
+ * Try other versions of the same compiler or your system compiler.
418
+
419
+ * Emit older versions of DWARF info, particularly DWARFv4 to v5, some tools
420
+ did/do not understand the new constructs.
421
+
422
+ * Reduce optimisation options as much as possible.
423
+
424
+ * Try all the language modes e.g. C++17/20 for C++.
425
+
426
+ * Link against LLVM's libcxx if you suspect a bug involving the system C++
427
+ library.
428
+
429
+ * For languages other than C/C++ e.g. Rust, try making an equivalent program
430
+ in C/C++. LLDB tends to try to fit other languages into a C/C++ mould, so
431
+ porting the program can make triage and reporting much easier.
432
+
433
+ * Operating system
434
+
435
+ * Use docker to try various versions of Linux.
436
+
437
+ * Use ``qemu-system `` to emulate other operating systems e.g. FreeBSD.
438
+
439
+ * Architecture
440
+
441
+ * Use `QEMU user space emulation <https://www.qemu.org/docs/master/user/main.html >`_
442
+ to quickly test other architectures. Note that ``lldb-server `` cannot be used
443
+ with this as the ptrace APIs are not emulated.
444
+
445
+ * If you need to test a big endian system use QEMU to emulate s390x (user
446
+ space emulation for just ``lldb ``, ``qemu-system `` for testing
447
+ ``lldb-server ``).
448
+
449
+ .. note :: When using QEMU you may need to use the built in GDB stub, instead of
450
+ ``lldb-server ``. For example if you wanted to debug ``lldb `` running
451
+ inside ``qemu-user-s390x `` you would connect to the GDB stub provided
452
+ by QEMU.
453
+
454
+ The same applies if you want to see how ``lldb `` would debug a test
455
+ program that is running on s390x. It's not totally accurate because
456
+ you're not using ``lldb-server ``, but this is fine for features that
457
+ are mostly implemented in ``lldb ``.
458
+
459
+ If you are running a full system using ``qemu-system ``, you likely
460
+ want to connect to the ``lldb-server `` running within the userspace
461
+ of that system.
462
+
463
+ If your test program is bare metal (meaning it requires no supporting
464
+ operating system) then connect to the built in GDB stub. This can be
465
+ useful when testing embedded systems or kernel debugging.
466
+
467
+ Reducing Ptrace Related Bugs
468
+ ****************************
469
+
470
+ This section is written Linux specific but the same can likely be done on
471
+ other Unix or Unix like operating systems.
472
+
473
+ Sometimes you will find ``lldb-server `` doing something with ptrace that causes
474
+ a problem. Your reproducer involves running ``lldb `` as well, this is not going
475
+ to go over well with kernel and is generally more difficult to explain if you
476
+ want to get help with it.
477
+
478
+ If you think you can get your point across without this, no need. If you're
479
+ pretty sure you have for example found a Linux Kernel bug, doing this greatly
480
+ increases the chances it'll get fixed.
481
+
482
+ We'll remove the LLDB dependency by making a smaller standalone program that
483
+ does the same actions. Starting with a skeleton program that forks and debugs
484
+ the inferior process.
485
+
486
+ The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1 >`_
487
+ (`source <https://github.com/eliben/code-for-blog/blob/master/2011/simple_tracer.c >`_)
488
+ is a great starting point. There is also an AArch64 specific example in
489
+ `the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c >`_.
490
+
491
+ For either, you'll need to modify that to fit your architecture. An tip for this
492
+ is to take any constants used in it, find in which function(s) they are used in
493
+ LLDB and then you'll find the equivalent constants in the same LLDB functions
494
+ for your architecture.
495
+
496
+ Once that is running as expected we can convert ``lldb-server ``'s into calls in
497
+ this program. To get a log of those, run ``lldb-server `` with
498
+ ``--log-channels "posix ptrace" ``. You'll see output like:
499
+
500
+ ::
501
+
502
+ $ lldb-server gdbserver :1234 --log-channels "posix ptrace" -- /tmp/test.o
503
+ 1694099878.829990864 <...> ptrace(16896, 2659963, 0x0000000000000000, 0x000000000000007E, 0)=0x0
504
+ 1694099878.830722332 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF7CC, 0x0000FFFFD14BF7D0, 16)=0x0
505
+ 1694099878.831967115 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF66C, 0x0000FFFFD14BF630, 16)=0xffffffffffffffff
506
+ 1694099878.831982136 <...> ptrace() failed: Invalid argument
507
+ Launched '/tmp/test.o' as process 2659963...
508
+
509
+ Each call is logged with its parameters and its result as the ``= `` on the end.
510
+
511
+ From here you will need to use a combination of the `ptrace documentation <https://man7.org/linux/man-pages/man2/ptrace.2.html >`_
512
+ and Linux Kernel headers (``uapi/linux/ptrace.h `` mainly) to figure out what
513
+ the calls are.
514
+
515
+ The most important parameter is the first, which is the request number. In the
516
+ example above ``16896 ``, which is hex ``0x4200 ``, is ``PTRACE_SETOPTIONS ``.
517
+
518
+ Luckily, you don't usually have to figure out all those early calls. Our
519
+ skeleton program will be doing all that, successfully we hope.
520
+
521
+ What you should do is record just the interesting bit to you. Let's say
522
+ something odd is happening when you read the ``tpidr `` register (this is an
523
+ AArch64 register, just for example purposes).
524
+
525
+ First, go to the ``lldb-server `` terminal and press enter a few times to put
526
+ some blank lines after the last logging output.
527
+
528
+ Then go to your ``lldb `` and:
529
+
530
+ ::
531
+
532
+ (lldb) register read tpidr
533
+ tpidr = 0x0000fffff7fef320
534
+
535
+ You'll see this from ``lldb-server ``:
536
+
537
+ ::
538
+
539
+ <...> ptrace(16900, 2659963, 0x0000FFFFD14BF6CC, 0x0000FFFFD14BF710, 8)=0x0
540
+
541
+ If you don't see that, it may be because ``lldb `` has cached it. The easiest way
542
+ to clear that cache is to step. Remember that some registers are read every
543
+ step, so you'll have to adjust depending on the situation.
544
+
545
+ Assuming you've got that line, you would look up what ``116900 `` is. This is
546
+ ``0x4204 `` in hex, which is ``PTRACE_GETREGSET ``. As we expected.
547
+
548
+ The following parameters are not as we might expect because what we log is a bit
549
+ different from the literal ptrace call. See your platform's definition of
550
+ ``PtraceWrapper `` for the exact form.
551
+
552
+ The point of all this is that by doing a single action you can get a few
553
+ isolated ptrace calls and you can then fill in the blanks and write
554
+ equivalent calls in the skeleton program.
555
+
556
+ The final piece of this is likely breakpoints. Assuming your bug does not
557
+ require a hardware breakpoint, you can get software breakpoints by inserting
558
+ a break instruction into the inferior's code at compile time. Usually by using
559
+ an architecture specific assembly statement, as you will need to know exactly
560
+ how many instructions to overwrite later.
561
+
562
+ Doing it this way instead of exactly copying what LLDB does will save a few
563
+ ptrace calls. The AArch64 example program shows how to do this.
564
+
565
+ * The inferior contains ``BRK #0 `` then ``NOP ``.
566
+ * 2 4 byte instructins means 8 bytes of data to replace, which matches the
567
+ minimum size you can write with ``PTRACE_POKETEXT ``.
568
+ * The inferior runs to the ``BRK ``, which brings us into the debugger.
569
+ * The debugger reads ``PC `` and writes ``NOP `` then ``NOP `` to the location
570
+ pointed to by ``PC ``.
571
+ * The debugger then single steps the inferior to the next instruction
572
+ (this is not required in this specific scenario, you could just continue but
573
+ it is included because this more cloesly matches what ``lldb `` does).
574
+ * The debugger then continues the inferior.
575
+ * The inferior exits, and the whole program exits.
576
+
577
+ Using this technique you can emulate the usual "run to main, do a thing" type
578
+ reproduction steps.
579
+
580
+ Finally, that "thing" is the ptrace calls you got from the ``lldb-server `` logs.
581
+ Add those to the debugger function and you now have a reproducer that doesn't
582
+ need any part of LLDB.
0 commit comments