-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major
-
None
-
Affects Version/s: 1.11.19
-
Component/s: None
-
Security Level: Public (Visbile by non-authn users.)
-
None
At 14:48 yesterday the following lines were logged in storm-backend.log in storm-atlas.cr.cnaf.infn.it:
11/30 14:48:30.204 Thread 27 - ERROR [44918c6c-894c-4094-8251-669888f43706]: rpcResponseHandler_Ls : ERROR: XML-RPC Fault: RPC failed at server. Failed to invoke method ls in class it.grid.storm.xmlrpc.XMLRPCMethods: GC overhead limit exceeded (code: 0)
Then, both frontend endpoints started logging only messages such as
11/30 21:52:27.246 Main - INFO [?]: acceptRequest : Error in soap_socket. Error 24
and everything was completely stuck (no processed requests) til manual restart of the backend service at 22 pm.
We had plenty of alarming Sensu checks, and of course a GGUS ticket at 20pm.
No problems with the underlying filesystem, and definitely not too many requests to the endpoint before it being stuck.
Following the restart, I increased the number of connections to the db, from 2000 to 4000, but I don't know whether this was the actual reason for the problem.
For the sake of completeness, I report that yesterday at 14.27 the network team declared "dxcnaf.cnaf.infn.it (primary CNAF DNS) was unreachable".
Thanks for your help.
- relates to
-
STOR-896 Investigate on a DBConnectionPool SQL Error due to inactivity
-
- Closed
-