How to configure systemd to start a service automatically after a crash in Linux

There are many reasons for systemd crash/go down on Linux, which you can investigate and fix ,but it is time consuming.

One thing you can do it immediately to bring the service back to online is  auto start when it goes down, which eventually reduce the downtime for better availability and to make sure that your service will be always available for users access.

It’s very easy to automate this on systemd systems because it has an options to enable this.

It can be done by bash script. We already had developed a bash script to automatically start a services when it’s crash on Linux.

What is systemd?

Systemd is a new init system and system manager which was implemented/adapted into all the major Linux distributions over the traditional SysV init systems.

systemd is compatible with SysV and LSB init scripts. It can work as a drop-in replacement for sysvinit system.

systemd is the first process which will  get started by kernel and holding PID 1.

It’s a parent process for everything and Fedora 15 is the first distribution which was adapted systemd instead of upstart.

systemctl is a command line utility and primary tool to manage the systemd daemons/services such as (start, restart, stop, enable, disable, reload & status).

systemd uses .service files Instead of bash scripts (SysV init uses). systemd sorts all daemons into their own Linux cgroups and you can see the system hierarchy by exploring /cgroup/systemd file.

The systemd service file has three major parts and we need to add the below required parameters under [Service] potion.

[Unit]
...

[Service]
Restart=on-failure
RestartSec=5s
...

[Install]
...
  • Restart: Configures whether the service shall be restarted when the service process exits, is killed, or a timeout is reached.
  • on-failure: If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal, when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered.
  • RestartSec: Configures the time to sleep before restarting a service. Takes a unit-less value in seconds, or a time span value such as “5min 20s”. Defaults to 100ms.
  • 5s: It will wait for 5 sec then start the service.

How to add Auto Start service parameter in systemd System?

It’s not a big deal to add these parameters. Open the corresponding service file and append the following parameters.

To explain this, we are going to test httpd service. Let’s see this.

# vi /etc/systemd/system/multi-user.target.wants/httpd.service

[Unit]
Description=Apache Web Server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=simple
ExecStart=/usr/bin/httpd -k start -DFOREGROUND
ExecStop=/usr/bin/httpd -k graceful-stop
ExecReload=/usr/bin/httpd -k graceful
PrivateTmp=true
LimitNOFILE=infinity
KillMode=mixed
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

You need to reload the daemon service once you made the changes. You can see the same by running the “systemctl status [httpd]” command as shown below.

We could see that, it is marked in color for better visibility. Also, the Apache httpd web server was started 27 mins ago.

# systemctl status httpd

● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-08-05 16:45:24 CDT; 27min ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 14420 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
Main PID: 14424 (httpd)
   Status: "Total requests: 0; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─14424 /usr/sbin/httpd -DFOREGROUND
           ├─14425 /usr/sbin/httpd -DFOREGROUND
           ├─14426 /usr/sbin/httpd -DFOREGROUND
           ├─14427 /usr/sbin/httpd -DFOREGROUND
           ├─14428 /usr/sbin/httpd -DFOREGROUND
           └─14429 /usr/sbin/httpd -DFOREGROUND

Aug 05 16:45:23 thvtstrhl7 systemd[1]: Stopped The Apache HTTP Server.
Aug 05 16:45:23 thvtstrhl7 systemd[1]: Starting The Apache HTTP Server...
Aug 05 16:45:24 thvtstrhl7 systemd[1]: Started The Apache HTTP Server.
Warning: httpd.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Just reload the daemon service.

# systemctl daemon-reload

This will go off now. It can be verified by running the following command once again.

# systemctl status httpd

● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-08-05 16:45:24 CDT; 27min ago
     Docs: man:httpd(8)
           man:apachectl(8)
Main PID: 14424 (httpd)
   Status: "Total requests: 0; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─14424 /usr/sbin/httpd -DFOREGROUND
           ├─14425 /usr/sbin/httpd -DFOREGROUND
           ├─14426 /usr/sbin/httpd -DFOREGROUND
           ├─14427 /usr/sbin/httpd -DFOREGROUND
           ├─14428 /usr/sbin/httpd -DFOREGROUND
           └─14429 /usr/sbin/httpd -DFOREGROUND

Aug 05 16:45:23 thvtstrhl7 systemd[1]: Stopped The Apache HTTP Server.
Aug 05 16:45:23 thvtstrhl7 systemd[1]: Starting The Apache HTTP Server...
Aug 05 16:45:24 thvtstrhl7 systemd[1]: Started The Apache HTTP Server.

To experiment this, use pidof command to find out the PID of a process. We can find out the process id (PID) in Linux using nine ways.

# pidof httpd

14429 14428 14427 14426 14425 14424

Once you get the PID details, just kill them all together in one go using the following command. There are many similar commands are available in Linux to Kill a Process ID (PID).

# kill -9 14429 14428 14427 14426 14425 14424

Once you killed the httpd PID, just run the following command to see the status. It’s showing the service is getting auto-restart. But still it’s not up.

# systemctl status httpd

● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2019-08-05 17:14:26 CDT; 2s ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 15978 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
  Process: 14424 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=killed, signal=KILL)
Main PID: 14424 (code=killed, signal=KILL)
   Status: "Total requests: 0; Current requests/sec: 0; Current traffic:   0 B/sec"

Aug 05 17:14:26 thvtstrhl7 systemd[1]: httpd.service: control process exited, code=exited status=1
Aug 05 17:14:26 thvtstrhl7 systemd[1]: Unit httpd.service entered failed state.
Aug 05 17:14:26 thvtstrhl7 systemd[1]: httpd.service failed.

Let me run the above command once again and see how the results looks. Yup, awesome, it’s running now. it’s working as expected.

It was started 564 Millisecond sec ago.

# systemctl status httpd

● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-08-05 17:14:31 CDT; 564ms ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 15978 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
Main PID: 15987 (httpd)
   Status: "Processing requests..."
   CGroup: /system.slice/httpd.service
           ├─15987 /usr/sbin/httpd -DFOREGROUND
           ├─15988 /usr/sbin/httpd -DFOREGROUND
           ├─15989 /usr/sbin/httpd -DFOREGROUND
           ├─15990 /usr/sbin/httpd -DFOREGROUND
           ├─15991 /usr/sbin/httpd -DFOREGROUND
           └─15992 /usr/sbin/httpd -DFOREGROUND

Aug 05 17:14:31 thvtstrhl7 systemd[1]: Starting The Apache HTTP Server...
Aug 05 17:14:31 thvtstrhl7 systemd[1]: Started The Apache HTTP Server.

It can be done for any services as required. I hope this article helps you.

Magesh Maruthamuthu

Love to play with all Linux distribution

You may also like...