Ticket #34 (assigned defect)

Opened 2 years ago

Last modified 2 years ago

apache segfaults

Reported by: anonymous Assigned to: bart (accepted)
Priority: major Milestone: 0.9.6
Component: eAccelerator Version: 0.9.5
Keywords: Cc:

Description

hi,

i'm running eaccelerator (eaccelerator-svn200603271208) on two dual opterons machines with apache2-worker and php 5.1.2. eaccelerator speeds up a lot but unluckily after running some hours the apache logs fills up with lots of "segmentation faults", i think about 50% of all request crash apache.

after reloading/restarting (doesn't matter) apache the segfaults are gone for some hours again but the appear again.

i played with apache's MaxRequestsPerChild? setting to 20, 1000 or 0..it doesn't help.

can you please tell me how to compile a debug version we can see where apache inside php/eaccelerator exactly crashes? does it affect the performance a lot?

corin

Change History

03/30/06 03:21:21 changed by bart

  • owner changed from somebody to bart.
  • status changed from new to assigned.
  • version set to 0.9.5.
  • component changed from Control panel to eAccelerator.

I think is the bug we have been getting reports for since the start of the eAccelerator project. It has been extremely difficult for any of the developers to reproduce it. Because we can't reproduce this it's very hard to debug. It seems only to happen onder high concurrency smp systems. For now there isn't a solution but we keep searching.

03/30/06 14:01:52 changed by anonymous

yes that's right, the crashes only happen on our dual opterons. we do loadbalancing with pound to 2 dual opterons and 1 atlohn64. all 3 webserver have exactly the same configuration (kernel, compiler settings, compiler version etc.), serve the same website etc. only apache on the smp machines crashes, the athlon64 wokrs perfectly. if i could help to resolve, pease let me know what to do. i can reproduce the crashes within a few hours... ;)

03/31/06 15:30:24 changed by sViruS@gmail.com

I have same problem on dual Xeon EM64T but on big load somtimes apache log segmentation fault. By the way ... do you have any dead process in apache ? somtimes after crash apache don't close slot and in server-status i see slots with big time in "Sending" state. I use this optimization: -march=nocona -mtune=nocona -O3 -fomit-frame-pointer -fforce-addr -ftracer -mmmx -msse3 -mfpmath=sse

05/31/06 23:50:03 changed by adrian@ziemkowski.com

We've been having the same problem with SMP IBM Blades, running Apache 1.3.34 and have tried both PHP 5.1.2 and 5.1.4, but I don't think it is due to the SMP setup.

I was able to reproduce it by hitting one of our pages on debug builds of php 5.1.2 and php 5.1.4 and the latest eaccelerator with JMeter for just 3-4 hours. I was running 40 concurrent threads against two complex pages that both produced memory leak debug messages immediately, but only with eaccelerator enabled (without eaccelerator, these messages never happen):

[Sat May 27 00:47:26 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
/opt/src/php-5.1.4/Zend/zend_vm_execute.h(8981) :  Freeing 0x08E6A4C4 (44 bytes), script=/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx
/opt/src/php-5.1.4/Zend/zend_API.c(763) : Actual location (location was relayed)
Last leak repeated 145 times
[Sat May 27 00:47:26 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
/opt/src/php-5.1.4/Zend/zend_hash.c(242) :  Freeing 0x0923A974 (47 bytes), script=/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx
Last leak repeated 916 times
[Sat May 27 00:47:26 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
/opt/src/php-5.1.4/Zend/zend_execute.c(1076) :  Freeing 0x08E8790C (32 bytes), script=/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx
/opt/src/php-5.1.4/Zend/zend_hash.c(169) : Actual location (location was relayed)
Last leak repeated 43 times
[Sat May 27 00:47:26 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
/opt/src/php-5.1.4/Zend/zend_execute.c(853) :  Freeing 0x0926D28C (16 bytes), script=/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx
Last leak repeated 83 times

As scary as that is, everything appears to work. After 3-4 hours with JMeter running the 40 concurrent requests, it changes to this output:

---------------------------------------
[Sat May 27 03:12:30 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
---------------------------------------
/opt/src/php-5.1.4/Zend/zend_variables.c(175) : Block 0xB05E2DB4 status:
/opt/src/php-5.1.4/Zend/zend_execute.h(64) : Actual location (location was relayed)
Beginning:      Overrun (magic=0x00000000, expected=0x7312F8DC)
      End:      Unknown
---------------------------------------
[Sat May 27 03:12:30 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
---------------------------------------
/opt/src/php-5.1.4/Zend/zend_variables.h(35) : Block 0xB05E2E60 status:
/opt/src/php-5.1.4/Zend/zend_variables.c(36) : Actual location (location was relayed)
Beginning:      Overrun (magic=0xB05E2E74, expected=0x7312F8DC)
      End:      Unknown
---------------------------------------
[Sat May 27 03:12:30 2006]  Script:  '/opt/htdocs/www.revelex.com/live/travel/cruise/search.rvlx'
---------------------------------------
/opt/src/php-5.1.4/Zend/zend_variables.c(175) : Block 0xB05E2E50 status:
/opt/src/php-5.1.4/Zend/zend_execute.h(64) : Actual location (location was relayed)
Beginning:      Overrun (magic=0x00676E70, expected=0x7312F8DC)
      End:      Unknown

If you look through the zend engine, you'll find that this message is being generated in zend_alloc.c while doing extra logic that only gets executed in debug mode. Running this same test without debug enabled ends up with apache segfaulting instead at this point without any eaccelerator log lines (unlike the disk cache corruption issue, which had eaccelerator debug lines).

When eaccelerator isn't loaded, we don't get the memory leaks or the overrun messages.

06/23/06 15:59:03 changed by szymon@mwg.pl

Oh. So it's eAccelerator! OMG. I've this problem for couple of weeks, and haven't knew what is wrong. But guys... I'm not on dual server... It's simple P4 with 1GB of RAM. Same problem. After couple of hours Apache segfault and clients gets "Zero sized reply" or half of page loaded. How can I debug it?

06/30/06 16:21:57 changed by bart

Can you try to get a sample php scripts and describe the setup you are using, maybe even the benchmark scripts. So we can try to find the problem. I have been trying for hours and hours for over a year and I've never hit the problem.

Do all scripts fit in memory? So if all scripts have been hit once, is there any free memory reported by eAccelerator?

07/03/06 16:52:54 changed by szymon@mwg.pl

Hm... Sample script? Squirellmail. I've read somewhere, that problem can be caused by disk cache. I've turned on only shm and Apache stops segfaulting.

08/02/06 08:27:20 changed by ajdonnison

I've managed to get a backtrace on an SMP system which may be of assistance. Let me know if there is anything else I can do to help track this down.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208080704 (LWP 22485)]
0x012753ec in _efree () from /etc/httpd/modules/libphp5.so
(gdb) bt
#0  0x012753ec in _efree () from /etc/httpd/modules/libphp5.so
#1  0x0127e4d9 in _zval_ptr_dtor () from /etc/httpd/modules/libphp5.so
#2  0x01290560 in zend_hash_destroy () from /etc/httpd/modules/libphp5.so
#3  0x01287f6e in _zval_dtor_func () from /etc/httpd/modules/libphp5.so
#4  0x0127e4d9 in _zval_ptr_dtor () from /etc/httpd/modules/libphp5.so
#5  0x0129060c in zend_hash_clean () from /etc/httpd/modules/libphp5.so
#6  0x0129fa14 in execute () from /etc/httpd/modules/libphp5.so
#7  0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#8  0x0127f642 in zend_call_function () from /etc/httpd/modules/libphp5.so
#9  0x012804d4 in call_user_function_ex () from /etc/httpd/modules/libphp5.so
#10 0x011fbe8a in zif_call_user_func_array () from /etc/httpd/modules/libphp5.so
#11 0x0129fd55 in execute () from /etc/httpd/modules/libphp5.so
#12 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#13 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#14 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#15 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#16 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#17 0x012c0265 in zend_get_zval_ptr () from /etc/httpd/modules/libphp5.so
#18 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#19 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#20 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#21 0x012c0265 in zend_get_zval_ptr () from /etc/httpd/modules/libphp5.so
#22 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
---Type <return> to continue, or q <return> to quit---
#23 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#24 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#25 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#26 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#27 0x0129f730 in execute () from /etc/httpd/modules/libphp5.so
#28 0x0129f4f9 in execute () from /etc/httpd/modules/libphp5.so
#29 0x01289527 in zend_execute_scripts () from /etc/httpd/modules/libphp5.so
#30 0x0125509a in php_execute_script () from /etc/httpd/modules/libphp5.so
#31 0x01301d33 in zend_get_zval_ptr () from /etc/httpd/modules/libphp5.so
#32 0x08068b28 in ap_run_handler ()
#33 0x08068e4d in ap_invoke_handler ()
#34 0x08065fbf in ap_process_request ()
#35 0x08060f25 in _start ()
#36 0x080725af in ap_run_process_connection ()
#37 0x08066deb in ap_graceful_stop_signalled ()
#38 0x080670ae in ap_graceful_stop_signalled ()
#39 0x0806716a in ap_graceful_stop_signalled ()
#40 0x08067a4e in ap_mpm_run ()
#41 0x0806de5e in main ()

08/16/06 20:39:39 changed by chrisd

Here is my exprience:

Been using EA 0.9.3 for years under IIS W2K + single processor server: 0 crash.
Under Apache with any PHP (4.x 5.x) either EA 0.9.4 or 0.9.5RC1: 1 crash /week.
Then I added a "complex" site with about 5000 page per day: 3 crash/day

So the error seems to be related to:
1) Apache only ?
2) Load (ie no load -> no crash and "more load" -> "more crash") ?

To try to identify a pattern, I've created a page, asking for feedback about this at: http://www.sitebuddy.com/php/eaccelerator_crash_apache_php_crashed_on_opline

Christophe D.

09/15/06 18:10:16 changed by bart

  • milestone set to 0.9.6.

I think this bug has been resolved in revision 272, I made a snapshot from the 0.9.5 branch so this can be tested.

http://snapshots.eaccelerator.net/eaccelerator-0.9.5-svn272.tar.gz

10/19/06 14:37:32 changed by tomlove

I updated to rev. 272 and it worked great for about 10 days on 3 high-concurrency dual Xeons. The usual random segfaults have gone.

However today I got a segfault which deadlocked apache 1.3.34. No backtrace or anything unfortunately, just the following:

[ 27770 ] EACCELERATOR: PHP crashed on opline 9 of eaccelerator_get() at /home/<removed>.php:406

[Thu Oct 19 07:19:40 2006] [notice] child pid 27770 exit signal Segmentation fault (11)
[Thu Oct 19 07:19:41 2006] [notice] child pid 27630 exit signal Segmentation fault (11)

This was happening a lot before I started using non-blocking locks on some of my _get() & _put() combinations. Not sure if there's any relationship but I still suspect eacc's locks.