Loading...

XML

Word

Printable

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

CMS fe and be (storm-cms.cr.cnaf.infn.it) show instabilities since last Saturday, with a similar behavior on Saturday, Monday morning, Monday afternoon and finally last night. I'll detail about last night.

Filesystem ok, load ok, memory ok, no network problems.

At 4:33 am everythign stops working: both FE and BE, which are on the same host, stop logging, we see 200 task pending (=max) in monitoring.log, we see 0s in heartbeat.log.

Sensu sends several alarms saying the FE cannot be contacted (both ipv4 and ipv6). Finally, at 6:50 am, the frontend is restarted and back to "normal".

CMS doesn't use webdav. The gridftp servers (xs-402 and xs-403) stop working in the same time range.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

be_metrics.pdf
1.01 MB
10/Apr/20 3:06 PM
be.PNG
65 kB
08/Apr/20 2:14 PM
requests.png
47 kB
07/Apr/20 5:37 PM
screenshot-1.png
44 kB
08/Apr/20 12:30 PM

relates to

STOR-1174 Include thread pool metrics reporting in storm-backend-metrics log

Closed

STOR-1198 Add Date to Backend's metrics log

Closed

Assignee:: Unassigned
Reporter:: Andrea Ceccanti
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: 07/Apr/20 11:02 AM
Updated:: 27/May/21 6:48 AM
Resolved:: 04/May/20 11:00 AM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates