Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.2
    • Security Level: Public (Visbile by non-authn users.)
    • Labels:
      None

      Description

      This ticket duplicates https://savannah.cern.ch/bugs/?101108 and refers to https://ggus.eu/ws/ticket_info.php?ticket=92492

      The fix concerns:

      1) After configuring with yaim, many tomcat6 errors are logged in catalina.out:

      java.lang.IllegalArgumentException: Document base /usr/share/tomcat6/webapps/ce-cream-es does not exist

      SEVERE: A web application appears to have started a thread named [Timer-4] but has failed to stop it. This is very likely to create a memory leak.

      SEVERE: A web application created a ThreadLocal with key of type [null] (value [org.apache.axiom.util.UIDGenerator$1@4a88e4c0]) and a value of type [long[]] (value [[J@24edb15c]) but failed to remove it when the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed.

      After a while the ce starts swapping and runs out of health.

      WORKAROUND:
      rm -f /usr/share/tomcat6/conf/Catalina/localhost/ce-cream-es.xml
      /etc/init.d/tomcat6 stop && /etc/init.d/glite-ce-blah-parser stop && sleep 3 && /etc/init.d/glite-ce-blah-parser start && /etc/init.d/tomcat6 start
      SOLUTION: Have this fixed in the next update

      2) [root@ce01-lcg ~]# cat /etc/glite-ce-cream/log4j.properties | egrep 'MaxFileSize|MaxBackupIndex'
      log4j.appender.fileout.MaxFileSize=1000KB
      log4j.appender.fileout.MaxBackupIndex=20

      These are too little in a production environment. An entire job lifecycle doesnt fit in 20MB of logs. furthermore, any run of yaim restores the too little values.

      WORKAROUND:
      modify /etc/glite-ce-cream/log4j.properties :
      log4j.appender.fileout.MaxFileSize=10M

      chattr +i /etc/glite-ce-cream/log4j.properties

      SOLUTION: Have this fixed in the next update

      3) After configuring with yaim, services are up, but the ce remains unresponsive:

      [sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
      2013-03-14 14:41:23,596 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection timed out]

      [sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
      2013-03-14 14:43:10,813 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection timed out]

      Tomcat is actually in a ill state:

      [root@ce01-lcg ~]# service tomcat6 status
      tomcat6 (pid 20389) is running... [ OK ]
      [root@ce01-lcg ~]# service tomcat6 stop
      Stopping tomcat6: [FAILED]

      WORKAROUND:
      service glite-ce-blah-parser stop
      service tomcat6 stop && service glite-ce-blah-parser stop && sleep 3 && service glite-ce-blah-parser start && service tomcat6 start

      Then it works:
      [sdalpra@ui01-ad32 ~]$ glite-ce-job-submit -a -r ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam my.jdl
      https://ce01-lcg.cr.cnaf.infn.it:84...

      SOLUTION: Have this fixed in the next update

        Activity

        There are no comments yet on this issue.

          People

          • Assignee:
            andreett Paolo Andreetto
            Reporter:
            andreett Paolo Andreetto
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved: