ExportXMLWordPrintable

    • Type: Task
    • Resolution: Fixed
    • Priority: Major
    • None
    • Affects Version/s: 1.11.19
    • Component/s: None
    • Security Level: Public (Visbile by non-authn users.)
    • None

      At 14:48 yesterday the following lines were logged in storm-backend.log in storm-atlas.cr.cnaf.infn.it:

      11/30 14:48:30.204 Thread 27 - ERROR [44918c6c-894c-4094-8251-669888f43706]: rpcResponseHandler_Ls : ERROR: XML-RPC Fault: RPC failed at server. Failed to invoke method ls in class it.grid.storm.xmlrpc.XMLRPCMethods: GC overhead limit exceeded (code: 0)

      Then, both frontend endpoints started logging only messages such as
      11/30 21:52:27.246 Main - INFO [?]: acceptRequest : Error in soap_socket. Error 24
      and everything was completely stuck (no processed requests) til manual restart of the backend service at 22 pm.

      We had plenty of alarming Sensu checks, and of course a GGUS ticket at 20pm.

      No problems with the underlying filesystem, and definitely not too many requests to the endpoint before it being stuck.

      Following the restart, I increased the number of connections to the db, from 2000 to 4000, but I don't know whether this was the actual reason for the problem.

      For the sake of completeness, I report that yesterday at 14.27 the network team declared "dxcnaf.cnaf.infn.it (primary CNAF DNS) was unreachable".

      Thanks for your help.

            Assignee:
            Unassigned
            Reporter:
            Lucia Morganti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: