Troubleshooting and Debugging
- Troubleshooting : the process of identifying, analyzing and solving problems.
-
Debugging : the process of identifying, analyzing and removing bugs in a system.
-
Troubleshooting is for infrastructure
- tcpdump, wireshark
- strace, ltrace
- ps, top etc
- Debugging : is for Software application
- Debugger : follows the code line by line , inspect changes in variable assignments, interrupt the program when a specific condition is met and more
Steps to solve any issue
- Getting Information to understand the problem
- Isolate and finding the root cause
- Performing the necessary remediation
- Document what we do
- The different things we tested to try
- Figure out the root cause.
- The steps we took to fix the issue.
strace
& ltrace
// TODO complete this section x
strace : to trace system calls made by the program
- o file.strace : to save output of trace to a file
Main steps to solving any issue ?
Questions to Ask before debugging
- Isolating the root cause is super important
Reproduction Case
Refer logging
- Linux // TODO get types of logs to investigate in linux
- /var/log/syslog
- .xsession-errors
- MacOS : Library/Logs/
- Windows : EventViewer
Our solution dont come up by wandering about things, we have to look at information to plug things into our knowledge graph, looking at error messages or documentation.
- Get a reproduction, try to isolate the problem
- Understanding the root cause is super important
Usefull Commands
top
& uptime
Load Average
- Load average : amount of time the process is busy in a minute
- Load average 1 means it was busy for whole min.
- shouldn’t be above the amount of process in computer
- The 3 values indicate
1 min
,5 min
,15 min
- must be less than CPU cores.
#top
top - 15:24:23 up 1:53, 1 user, load average: 0.08, 0.06, 0.06
Tasks: 44 total, 1 running, 43 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.2 us, 2.2 sy, 0.0 ni, 93.3 id, 1.1 wa, 0.0 hi, 1.1 si, 0.0 st
MiB Mem : 1971.8 total, 845.8 free, 961.2 used, 351.5 buff/cache
MiB Swap: 1024.0 total, 1019.7 free, 4.3 used. 1010.6 avail Mem
iostat
:
system monitoring tool that reports CPU and I/O statistics for devices and partitions.
Linux 5.15.167.4-microsoft-standard-WSL2 (skynet-e14-r3) 02/16/25 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.24 0.00 0.24 0.06 0.00 99.45
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
/dev/sda 0.17 11.45 0.00 0.00 74445 0 0
/dev/sdb 0.03 0.35 0.64 0.00 2292 4184 0
/dev/sdc 12.81 294.03 297.25 197.97 1911161 1932064 1286760
iotop
:
Utility that displays real-time I/O usage by processes or threads on a Linux system.
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE> COMMAND 1 be/4 root 0.00 B/s 0.00 B/s init
2 be/4 root 0.00 B/s 0.00 B/s init
9 be/4 root 0.00 B/s 0.00 B/s init [Interop]
7 be/4 root 0.00 B/s 0.00 B/s plan9 --control-socket 7 --log-level 4 --server-fd 8 --pipe-fd 10 --log-truncate
8 be/4 root 0.00 B/s 0.00 B/s plan9 --control-socket 7 --log-level 4 --server-fd 8 --pipe-fd 10 --log-truncate
vmstat
:
Vmstat is a performance monitoring tool that provides information about processes, memory, paging, block I/O, traps, and CPU activity.
- imp parameter : wa
-> means wait time
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st gu
0 0 4388 236800 3740 532620 0 0 187 174 228 0 0 0 99 0 0 0
iftop
:
iftop is a network monitoring tool that displays bandwidth usage on an interface by host.
-
time
: to calculate time taken by program to complete -
kill
command // TODO get types of kill commands
Like isolating causes, understanding error messages, adding logging information, and generating new ideas for possible failures.