Investigate CMS StoRM instabilities

XMLWordPrintable

    • Type: Task
    • Resolution: Fixed
    • Priority: Major
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      CMS fe and be (storm-cms.cr.cnaf.infn.it) show instabilities since last Saturday, with a similar behavior on Saturday, Monday morning, Monday afternoon and finally last night. I'll detail about last night.

      Filesystem ok, load ok, memory ok, no network problems.

      At 4:33 am everythign stops working: both FE and BE, which are on the same host, stop logging, we see 200 task pending (=max) inĀ  monitoring.log, we see 0s in heartbeat.log.

      Sensu sends several alarms saying the FE cannot be contacted (both ipv4 and ipv6). Finally, at 6:50 am, the frontend is restarted and back to "normal".

      CMS doesn't use webdav. The gridftp servers (xs-402 and xs-403) stop working in the same time range.

        1. be_metrics.pdf
          1.01 MB
        2. be.PNG
          be.PNG
          65 kB
        3. requests.png
          requests.png
          47 kB
        4. screenshot-1.png
          screenshot-1.png
          44 kB

            Assignee:
            Unassigned
            Reporter:
            Andrea Ceccanti
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: