[STOR-1395] StoRM Backend service enters failed state when stopped Created: 16/Apr/21 Updated: 27/May/21 Resolved: 27/Apr/21 |
|
Status: | Closed |
Project: | StoRM |
Component/s: | backend |
Affects Version/s: | 1.11.20 |
Fix Version/s: | 1.11.21 |
Security Level: | Public (Visbile by non-authn users.) |
Type: | Bug | Priority: | Major |
Reporter: | Enrico Vianello | Assignee: | Enrico Vianello |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Starting the backend service works fine but when stopping, the service remains in a failed state: Apr 16 16:06:07 omii005-vm03.cnaf.infn.it systemd[1]: storm-backend-server.service: main process exited, code=exited, status=143/n/a Apr 16 16:06:07 omii005-vm03.cnaf.infn.it systemd[1]: Stopped StoRM Backend service. Apr 16 16:06:07 omii005-vm03.cnaf.infn.it systemd[1]: Unit storm-backend-server.service entered failed state. Apr 16 16:06:07 omii005-vm03.cnaf.infn.it systemd[1]: storm-backend-server.service failed. Apr 16 16:06:07 omii005-vm03.cnaf.infn.it systemd[1]: Started StoRM Backend service. Exit code 143 means that the program received a SIGTERM signal to instruct it to exit. The JVM catches the signal, does a clean shutdown, i.e. it runs all registered shutdown hooks (there's one for StoRM Backend which stops several threads and services), but still exits with an exit code of 143. That's just how Java works. We should be able to suppress this by adding the exit code into the unit file as a "success" exit status: [Service] SuccessExitStatus=143 |
Comments |
Comment by Enrico Vianello [ 27/Apr/21 ] |
https://github.com/italiangrid/storm/commit/5a142d9a49beb8d9eb64c40a6d1b8d88618e0521 |
Comment by Andrea Ceccanti [ 20/Apr/21 ] |
Ok, I expected that the unit included SuccessExitStatus=143, which it doesn't. |
Comment by Enrico Vianello [ 20/Apr/21 ] |
[root@transfer-test ~]# systemctl status storm-webdav
● storm-webdav.service - StoRM WebDAV service
Loaded: loaded (/usr/lib/systemd/system/storm-webdav.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/storm-webdav.service.d
└─filelimit.conf, storm-webdav.conf
Active: active (running) since mar 2021-04-20 10:48:28 CEST; 1s ago
Main PID: 7831 (java)
CGroup: /system.slice/storm-webdav.service
└─7831 /usr/bin/java -Xms4192m -Xmx4192m -Djava.io.tmpdir=/var/lib/storm-webdav/work -Dlogging.config=/etc/storm/webdav/logback.xml -jar /usr/share/java/storm-webdav/st...
apr 20 10:48:28 transfer-test.cr.cnaf.infn.it systemd[1]: storm-webdav.service: main process exited, code=exited, status=143/n/a
apr 20 10:48:28 transfer-test.cr.cnaf.infn.it systemd[1]: Stopped StoRM WebDAV service.
apr 20 10:48:28 transfer-test.cr.cnaf.infn.it systemd[1]: Unit storm-webdav.service entered failed state.
apr 20 10:48:28 transfer-test.cr.cnaf.infn.it systemd[1]: storm-webdav.service failed.
apr 20 10:48:28 transfer-test.cr.cnaf.infn.it systemd[1]: Started StoRM WebDAV service.
The service is restarted and status is fine. But the exit code is 143 and it's wrongly considered as a failure code when service is stopped. This is the T1 transfer-test node linked to storm-test testbed. It's easy not to notice it, which is that T1 deployment? We can check that systemctl status. |
Comment by Andrea Ceccanti [ 20/Apr/21 ] |
? |
Comment by Enrico Vianello [ 20/Apr/21 ] |
Also StoRM WebDAV has the same issue. Maybe it's something related to the transition to Java 11? |
Comment by Enrico Vianello [ 16/Apr/21 ] |
Tested, it works. Apr 16 16:56:52 omii005-vm03.cnaf.infn.it systemd[1]: Stopped StoRM Backend service. Apr 16 16:56:52 omii005-vm03.cnaf.infn.it systemd[1]: Started StoRM Backend service. |