Sysdig – Exploration and troubleshooting tool for Linux systems

Sysdig is an open source, system exploration/diagnosing and troubleshooting tool for Linux with native container support. It  captures system state and activity from a running Linux instance, then save, filter and analyze.

Sysdig was written in Lua program language and includes a simple, intuitive, powerful and fully customizable curses UI (User Interface) called Csysdig.

This tool included all the major Linux troubleshooting commands like htop, iftop, lsof, strace, iostat, ps, netstat, tcpdump, etc,., into one single application so we can use this tool for any troubleshooting activity instead of going with each of above commands.

How to install Sysdig in Linux ?

To install sysdig in one step, simply run the following command as root or with sudo.

$ curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

How to install Sysdig on Debian based systems ?

sysdig package is included in the latest versions of Debian and Ubuntu systems, however, sysdig is updated with new functionality all the time, so I would recommend you to follow the below installation.

Trust the Sysdig package Draios GPG key

$ sudo curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public | apt-key add -
$ sudo curl -s -o /etc/apt/sources.list.d/draios.list http://download.draios.com/stable/deb/draios.list
$ sudo apt-get update

Install kernel headers

$ sudo apt-get -y install linux-headers-$(uname -r)

Install sysdig using APT command or APT-GET command

$ sudo apt-get -y install sysdig

How to install Sysdig on RHEL based systems ?

Follow the below procedure to install sysdig system exploration/diagnosing and troubleshooting tool on RHEL based systems such as CentOS, RHEL and Fedora.

Trust the Sysdig package Draios GPG key using RPM command.

$ sudo rpm --import https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public
$ sudo curl -s -o /etc/yum.repos.d/draios.repo http://download.draios.com/stable/rpm/draios.repo

Install EPEL Repository, in order to install DKMS on RHEL & CentOS systems.

Suggested Read : Install/Enable EPEL Repository on RHEL & CentOS

Install kernel headers

$ sudo yum -y install kernel-devel-$(uname -r)

Install sysdig using YUM command.

$ sudo yum -y install sysdig

How to install Sysdig on Arch Linux based systems ?

Sysdig package is available in Arch repository so we can easily install with help of pacman command.

$ sudo pacman -S linux-headers
$ sudo pacman -S sysdig

How to use Sysdig ?

Once the installation is complete, just run the sysdig command without any argument and instantly your screen will be flooded with lots of output and you can’t understand anything so, Its better use more command to see page by page.

# sysdig | more
53 12:56:15.084694728 8 sysdig (16511) > switch next=0 pgft_maj=0 pgft_min=1279 vm_size=557468 vm_rss=3224 vm_swap=0
54 12:56:15.085007056 8  (0) > switch next=37 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
55 12:56:15.085015656 8  (37) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
56 12:56:15.089027840 8  (0) > switch next=2574 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
57 12:56:15.089038022 8  (2574) > switch next=37 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
58 12:56:15.089049754 8  (37) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
59 12:56:15.114770934 8  (0) > switch next=16511(sysdig) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
63 12:56:15.114978525 8 sysdig (16511) > switch next=2574 pgft_maj=0 pgft_min=1305 vm_size=557468 vm_rss=3328 vm_swap=0
64 12:56:15.114991039 8  (2574) > switch next=16511(sysdig) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
75 12:56:15.115096876 9  (0) > switch next=16512(more) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
80 12:56:15.115119616 9 more (16512) < read res=220 data=53 12:56:15.084694728 8 sysdig (16511) > switch next=0 pgft_maj=0 pgft_min=1279
89 12:56:15.115157413 9 more (16512) > open
95 12:56:15.115177276 9 more (16512) < open fd=3(/usr/lib64/gconv/gconv-modules.cache) name=/usr/lib64/gconv/gconv-modules.cache flags=1(O_
RDONLY) mode=0
97 12:56:15.115180880 9 more (16512) > fstat fd=3(/usr/lib64/gconv/gconv-modules.cache)
98 12:56:15.115182930 9 more (16512) < fstat res=0 99 12:56:15.115183930 9 more (16512) > mmap addr=0 length=26060 prot=1(PROT_READ) flags=1(MAP_SHARED) fd=3(/usr/lib64/gconv/gconv-modules.c
ache) offset=0
101 12:56:15.115192296 9 more (16512) < mmap res=7F43625D1000 vm_size=105188 vm_rss=752 vm_swap=0 103 12:56:15.115193276 9 more (16512) > close fd=3(/usr/lib64/gconv/gconv-modules.cache)
104 12:56:15.115194500 9 more (16512) < close res=0 106 12:56:15.115211114 8 sysdig (16511) > switch next=0 pgft_maj=0 pgft_min=1308 vm_size=557472 vm_rss=3340 vm_swap=0
107 12:56:15.115244874 9 more (16512) > fstat fd=1(/dev/pts/0)
108 12:56:15.115245774 9 more (16512) < fstat res=0 109 12:56:15.115246411 9 more (16512) > mmap addr=0 length=4096 prot=3(PROT_READ|PROT_WRITE) flags=10(MAP_PRIVATE|MAP_ANONYMOUS) fd=4294967295
 offset=0
110 12:56:15.115249114 9 more (16512) < mmap res=7F43625D0000 vm_size=105192 vm_rss=804 vm_swap=0 111 12:56:15.115270487 9 more (16512) > write fd=1(/dev/pts/0) size=118
112 12:56:15.115287274 9 more (16512) < write res=118 data=53 12:56:15.084694728 8 sysdig (16511) > switch next=0 pgft_maj=0 pgft_min=1279
113 12:56:15.115311317 9 more (16512) > write fd=1(/dev/pts/0) size=102
114 12:56:15.115315021 9 more (16512) < write res=102 data=54 12:56:15.085007056 8  (0) > switch next=37 pgft_maj=0 pgft_min=0 vm_size=
115 12:56:15.115316804 9 more (16512) > read fd=0(

pipe:[546341]) size=4096 116 12:56:15.115319227 9 more (16512) < read res=526 data=55 12:56:15.085015656 8 (37) > switch next=0 pgft_maj=0 pgft_min=0 vm_size= 117 12:56:15.115342354 9 more (16512) > write fd=1(/dev/pts/0) size=102 –More–

Add -w flag in the sysdig command to save the output into file for later analyze. To stop the command, just hit CTRL+C.

# sysdig -w output.dump

Add -r flag in the sysdig command to read the .dump file.

# sysdig -r output.dump

By default sysdig command stores the output in binary file, add -A flag to save sysdig output in ASCII.

# sysdig -A > /path to filename.txt

What’s Sysdig filter ?

sysdig command has filters that allow you to filter the output to specific information like cpu, memory, mysql, crond, sshd, apache, etc,.,

Run the following command to get the list of available filters.

# sysdig -l | more

----------------------
Field Class: fd

fd.num          the unique number identifying the file descriptor.
fd.type         type of FD. Can be 'file', 'directory', 'ipv4', 'ipv6', 'unix',
                 'pipe', 'event', 'signalfd', 'eventpoll', 'inotify' or 'signal
                fd'.
fd.typechar     type of FD as a single character. Can be 'f' for file, 4 for IP
                v4 socket, 6 for IPv6 socket, 'u' for unix socket, p for pipe,
                'e' for eventfd, 's' for signalfd, 'l' for eventpoll, 'i' for i
                notify, 'o' for unknown.
fd.name         FD full name. If the fd is a file, this field contains the full
                 path. If the FD is a socket, this field contain the connection
                 tuple.
fd.directory    If the fd is a file, the directory that contains it.
fd.filename     If the fd is a file, the filename without the path.
fd.ip           matches the ip address (client or server) of the fd.
fd.cip          client IP address.
fd.sip          server IP address.
fd.lip          local IP address.
fd.rip          remote IP address.
fd.port         (FILTER ONLY) matches the port (either client or server) of the
                 fd.
fd.cport        for TCP/UDP FDs, the client port.
fd.sport        for TCP/UDP FDs, server port.
fd.lport        for TCP/UDP FDs, the local port.
fd.rport        for TCP/UDP FDs, the remote port.
fd.l4proto      the IP protocol of a socket. Can be 'tcp', 'udp', 'icmp' or 'ra
                w'.
fd.sockfamily   the socket family for socket events. Can be 'ip' or 'unix'.
fd.is_server    'true' if the process owning this FD is the server endpoint in
                the connection.
fd.uid          a unique identifier for the FD, created by chaining the FD numb
--More--

For example, we are going to filter crond activity.

# sysdig -r output.dump proc.name=crond | more
1276 13:00:01.595766726 10 crond (7668) < nanosleep res=0 1277 13:00:01.595782313 10 crond (7668) > stat
1278 13:00:01.595792330 10 crond (7668) < stat res=0 path=/etc/localtime 1279 13:00:01.595802410 10 crond (7668) > select
1280 13:00:01.595809863 10 crond (7668) < select res=0 1281 13:00:01.595840453 10 crond (7668) > open
1282 13:00:01.595848560 10 crond (7668) < open fd=6(/etc/passwd) name=/etc/passwd flags=4097(O_RDONLY|O_CLOEXEC) mode=0
1283 13:00:01.595857870 10 crond (7668) > fstat fd=6(/etc/passwd)
1284 13:00:01.595859280 10 crond (7668) < fstat res=0 1285 13:00:01.595860206 10 crond (7668) > mmap addr=0 length=4096 prot=3(PROT_READ|PROT_WRITE) flags=10(MAP_PRIVATE|MAP_ANONYMOUS) fd=42949672
95 offset=0
1286 13:00:01.595866736 10 crond (7668) < mmap res=7F3CCB758000 vm_size=118492 vm_rss=1248 vm_swap=0 1287 13:00:01.595868513 10 crond (7668) > read fd=6(/etc/passwd) size=4096
1288 13:00:01.595884886 10 crond (7668) < read res=1556 data=root:x:0:0:root:/root:/bin/bash.bin:x:1:1:bin:/bin:/sbin/nologin.daemon:x:2:2:da 1289 13:00:01.595895390 10 crond (7668) > close fd=6(/etc/passwd)
1290 13:00:01.595901726 10 crond (7668) < close res=0 1291 13:00:01.595902576 10 crond (7668) > munmap addr=7F3CCB758000 length=4096
1292 13:00:01.595913143 10 crond (7668) < munmap res=0 vm_size=118488 vm_rss=1248 vm_swap=0 1293 13:00:01.595930350 10 crond (7668) > clone
1294 13:00:01.596161710 10 crond (7668) < clone res=16623(crond) exe=crond args= tid=7668(crond) pid=7668(crond) ptid=1(init) cwd= fdlimit=102 4 pgft_maj=0 pgft_min=30188 vm_size=118488 vm_rss=1248 vm_swap=0 comm=crond cgroups=cpuset=/.ns=/.cpu_cgroup=/.cpuacct=/.mem_cgroup=/.devices= /.freezer=/.net_cls... flags=0 uid=0 gid=0 vtid=7668(crond) vpid=7668(crond) 1295 13:00:01.596202293 10 crond (7668) > rt_sigprocmask
1296 13:00:01.596204780 10 crond (7668) < rt_sigprocmask 1297 13:00:01.596206273 10 crond (7668) > rt_sigaction
1298 13:00:01.596207796 10 crond (7668) < rt_sigaction 1299 13:00:01.596208480 10 crond (7668) > rt_sigprocmask
1300 13:00:01.596209043 10 crond (7668) < rt_sigprocmask 1302 13:00:01.596209900 10 crond (7668) > nanosleep interval=60000000000(60s)
1303 13:00:01.596214546 10 crond (7668) > switch next=2576 pgft_maj=0 pgft_min=30194 vm_size=118488 vm_rss=1248 vm_swap=0
1304 13:00:01.596223563 17 crond (16623) < clone res=0 exe=crond args= tid=16623(crond) pid=16623(crond) ptid=7668(crond) cwd= fdlimit=1024 pg ft_maj=0 pgft_min=1 vm_size=118488 vm_rss=608 vm_swap=0 comm=crond cgroups=cpuset=/.ns=/.cpu_cgroup=/.cpuacct=/.mem_cgroup=/.devices=/.freezer =/.net_cls... flags=0 uid=0 gid=0 vtid=16623(crond) vpid=16623(crond) 1306 13:00:01.596302073 17 crond (16623) > close fd=3(/var/run/crond.pid)
--More--

To print any live process activity, use the following format and change the process name which you want to monitor.

# sysdig proc.name=sshd | more
436 02:32:23.143994303 0 sshd (11004) < select res=0 437 02:32:23.144025889 0 sshd (11004) > socket domain=16(AF_ROUTE) type=3 proto=9
438 02:32:23.144059009 0 sshd (11004) < socket fd=7()
439 02:32:23.144061098 0 sshd (11004) > fcntl fd=7() cmd=3(F_SETFD)
440 02:32:23.144063523 0 sshd (11004) < fcntl res=0(/dev/null)
441 02:32:23.144143953 0 sshd (11004) > socket domain=16(AF_ROUTE) type=3 proto=0
442 02:32:23.144151631 0 sshd (11004) < socket fd=8()
443 02:32:23.144152668 0 sshd (11004) > bind fd=8()
444 02:32:23.144155571 0 sshd (11004) < bind res=0 addr=NULL 445 02:32:23.144157437 0 sshd (11004) > getsockname
446 02:32:23.144165665 0 sshd (11004) < getsockname 447 02:32:23.144171017 0 sshd (11004) > gettimeofday
448 02:32:23.144172228 0 sshd (11004) < gettimeofday 449 02:32:23.144173489 0 sshd (11004) > sendto fd=8() size=20 tuple=NULL
450 02:32:23.144244181 0 sshd (11004) < sendto res=20 data=........w..Y........ 451 02:32:23.144245835 0 sshd (11004) > recvmsg fd=8()
452 02:32:23.144257486 0 sshd (11004) < recvmsg res=108 size=108 data=0.......w..Y.*..............................lo..<.......w..Y.*... ..........BF.. tuple=NULL

To filter an events for a specific file, use fd.name filter. This will shows what processes are reading or writing on this file alone.

# sysdig fd.name=/var/log/mysqld.log
21518 02:34:02.962639968 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=61
21519 02:34:02.962662647 0 mysqld (11061) < write res=61 data=170809 2:34:02 [Note] /usr/libexec/mysqld: Normal shutdown.. 21735 02:34:02.963647697 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=68
21736 02:34:02.963654268 0 mysqld (11061) < write res=68 data=170809 2:34:02 [Note] Event Scheduler: Purging the queue. 0 events. 21771 02:34:02.964178502 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=15
21772 02:34:02.964185382 0 mysqld (11061) < write res=15 data=170809 2:34:02 21773 02:34:02.964187559 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=31
21774 02:34:02.964190290 0 mysqld (11061) < write res=31 data= InnoDB: Starting shutdown.... 22712 02:34:04.599886181 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=15
22713 02:34:04.599898808 0 mysqld (11061) < write res=15 data=170809 2:34:04 22714 02:34:04.599906259 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=58
22715 02:34:04.599908679 0 mysqld (11061) < write res=58 data= InnoDB: Shutdown completed; log sequence number 0 44233. 22732 02:34:04.600531200 0 mysqld (11061) > write fd=2(/var/log/mysqld.log) size=63
22733 02:34:04.600545615 0 mysqld (11061) < write res=63 data=170809  2:34:04 [Note] /usr/libexec/mysqld: Shutdown complete..
23098 02:34:04.603462535 0 mysqld_safe (10882) < open fd=3(/var/log/mysqld.log) name=/var/log/mysqld.log flags=14(O_APPEND|O_CREAT|O_WRONLY) mode=0666
23107 02:34:04.603467723 0 mysqld_safe (10882) > dup fd=3(/var/log/mysqld.log)
23108 02:34:04.603468305 0 mysqld_safe (10882) < dup res=1(/var/log/mysqld.log)
23109 02:34:04.603468671 0 mysqld_safe (10882) > close fd=3(/var/log/mysqld.log)
23110 02:34:04.603468976 0 mysqld_safe (10882) < close res=0 23111 02:34:04.603471050 0 mysqld_safe (10882) > write fd=1(/var/log/mysqld.log) size=82
23112 02:34:04.603477718 0 mysqld_safe (10882) < write res=82 data=170809 02:34:04 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ende

What’s Chisels & How to use it ?

Before going to expriment sysdig tool, try to understand what are the chisels available to perform. Sysdig’s chisels are little scripts that analyze the sysdig event stream to perform useful actions.

11 category chisels are available and each category has multiple parameters

  • Application
  • CPU Usage
  • Errors
  • I/O
  • Logs
  • Misc
  • Net
  • Performance
  • Security
  • System State
  • Tracers

After running the above command, you will get a short description for each of the available chisels.

# sysdig -cl

Category: Application
---------------------
httplog         HTTP requests log
httptop         Top HTTP requests
memcachelog     memcached requests log

Category: CPU Usage
-------------------
spectrogram     Visualize OS latency in real time.
subsecoffset    Visualize subsecond offset execution time.
topcontainers_cpu
                Top containers by CPU usage
topprocs_cpu    Top processes by CPU usage

Category: Errors
----------------
topcontainers_error
                Top containers by number of errors
topfiles_errors Top files by number of errors
topprocs_errors top processes by number of errors

Category: I/O
-------------
echo_fds        Print the data read and written by processes.
fdbytes_by      I/O bytes, aggregated by an arbitrary filter field
fdcount_by      FD count, aggregated by an arbitrary filter field
fdtime_by       FD time group by
iobytes         Sum of I/O bytes on any type of FD
iobytes_file    Sum of file I/O bytes
spy_file        Echo any read/write made by any process to all files. Optionall
                y, you can provide the name of one file to only intercept reads
                /writes to that file.
stderr          Print stderr of processes
stdin           Print stdin of processes
stdout          Print stdout of processes
topcontainers_file
                Top containers by R+W disk bytes
topfiles_bytes  Top files by R+W bytes
topfiles_time   Top files by time
topprocs_file   Top processes by R+W disk bytes

Category: Logs
--------------
spy_logs        Echo any write made by any process to a log file. Optionally, e
                xport the events around each log message to file.
spy_syslog      Print every message written to syslog. Optionally, export the e
                vents around each syslog message to file.

To know detailed description and the list of arguments for one of the chisels, use the –i command line switch.

# sysdig -i httplog

Category: Application
---------------------
httplog         HTTP requests log

Show a log of all HTTP requests
Args:
(None)

To run one of the chisels, use the –c flag.

# sysdig -c topprocs_cpu
CPU%                Process             PID
--------------------------------------------------------------------------------
0.00%               udevd               485
0.00%               sysdig              11204
0.00%               sshd                1696
0.00%               sshd                10754
0.00%               mysqld              11182
0.00%               sshd                11011
0.00%               httpd               10733
0.00%               init                1
0.00%               master              1571
0.00%               hald                1234

If you want to apply filter in Chisels output, use the following format.

# sysdig -c topprocs_cpu "proc.name contains sshd"
CPU%                Process             PID
--------------------------------------------------------------------------------
0.00%               sshd                10754
0.00%               sshd                1492
0.00%               sshd                11011
0.00%               sshd                1696

How to monitor containers ?

Sysdig especially designed for container monitoring, run the following command to install sysdig container. This method allows full visibility into all containers on the host OS.

In order to monitor container, make sure you have already installed docker on your system, refer the following URL’s for Docker management.

Suggested Read :
(#) How to Install and Configure Docker on Linux
(#) How to play with Docker images on Linux
(#) How to play with Docker containers on Linux
(#) How to Install, Run Applications inside Docker Containers

For Debian based systems

# apt-get -y install linux-headers-$(uname -r)

For RHEL based systems

# yum -y install kernel-devel-$(uname -r)

Download sysdig container image.

# docker pull sysdig/sysdig

Run sysdig container.

# docker run -i -t --name sysdig --privileged -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro sysdig/sysdig

For testing purpose, we are running one more container tender_brattain along with sysdig container.

# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
271676e724b3        sysdig/sysdig       "/docker-entrypoint.   10 minutes ago      Up 10 minutes                           sysdig
cb8c2c5125b8        ubuntu              "/bin/bash"            13 minutes ago      Up 13 minutes      

Run the following command to see the host system & container activity.

# sysdig -pc -c topprocs_cpu
CPU%                Process             Host_pid            Container_pid       container.name
--------------------------------------------------------------------------------
4.00%               sysdig              1666                1666                host
2.00%               bash                31709               1                   tender_brattain
2.00%               docker              31005               31005               host
1.00%               docker              31648               31648               host
0.00%               init                1                   1                   host
0.00%               crond               1612                1612                host
0.00%               rsyslogd            1133                1133                host
0.00%               top                 599                 882                 sysdig
0.00%               auditd              1111                1111                host
0.00%               docker              32075               32075               host

To check network connection with host and container

# sysdig -pc -c topconns
Bytes               container.name      Proto               Conn
--------------------------------------------------------------------------------
7.64KB              host                tcp                 103.5.134.167:56249->66.70.189.137:22
528B                host                tcp                 103.5.134.167:56623->66.70.189.137:22
210B                tender_brattain     udp                 172.17.0.3:35363->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:54557->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:43212->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:42133->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:56387->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:50810->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:36668->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:43161->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:51875->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:33729->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:54421->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:33679->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:37343->213.186.33.99:53
210B                tender_brattain     udp                 172.17.0.3:42390->213.186.33.99:53

Leave a Reply

Your email address will not be published. Required fields are marked *