-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Security Level: Public (Visbile by non-authn users.)
-
None
This ticket duplicates https://savannah.cern.ch/bugs/?101108 and refers to https://ggus.eu/ws/ticket_info.php?ticket=92492
The fix concerns:
1) After configuring with yaim, many tomcat6 errors are logged in catalina.out:
java.lang.IllegalArgumentException: Document base /usr/share/tomcat6/webapps/ce-cream-es does not exist
SEVERE: A web application appears to have started a thread named [Timer-4] but has failed to stop it. This is very likely to create a memory leak.
SEVERE: A web application created a ThreadLocal with key of type [null] (value [org.apache.axiom.util.UIDGenerator$1@4a88e4c0]) and a value of type [long[]] (value [[J@24edb15c]) but failed to remove it when the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed.
After a while the ce starts swapping and runs out of health.
WORKAROUND:
rm -f /usr/share/tomcat6/conf/Catalina/localhost/ce-cream-es.xml
/etc/init.d/tomcat6 stop && /etc/init.d/glite-ce-blah-parser stop && sleep 3 && /etc/init.d/glite-ce-blah-parser start && /etc/init.d/tomcat6 start
SOLUTION: Have this fixed in the next update
2) [root@ce01-lcg ~]# cat /etc/glite-ce-cream/log4j.properties | egrep 'MaxFileSize|MaxBackupIndex'
log4j.appender.fileout.MaxFileSize=1000KB
log4j.appender.fileout.MaxBackupIndex=20
These are too little in a production environment. An entire job lifecycle doesnt fit in 20MB of logs. furthermore, any run of yaim restores the too little values.
WORKAROUND:
modify /etc/glite-ce-cream/log4j.properties :
log4j.appender.fileout.MaxFileSize=10M
chattr +i /etc/glite-ce-cream/log4j.properties
SOLUTION: Have this fixed in the next update
3) After configuring with yaim, services are up, but the ce remains unresponsive:
[sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
2013-03-14 14:41:23,596 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection timed out]
[sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
2013-03-14 14:43:10,813 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection timed out]
Tomcat is actually in a ill state:
[root@ce01-lcg ~]# service tomcat6 status
tomcat6 (pid 20389) is running... [ OK ]
[root@ce01-lcg ~]# service tomcat6 stop
Stopping tomcat6: [FAILED]
WORKAROUND:
service glite-ce-blah-parser stop
service tomcat6 stop && service glite-ce-blah-parser stop && sleep 3 && service glite-ce-blah-parser start && service tomcat6 start
Then it works:
[sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
https://ce01-lcg.cr.cnaf.infn.it:84...
SOLUTION: Have this fixed in the next update