Skip to main content

Throwing this out for assistance...

Posted by IVSCOMM on Tue, 04/19/2011

First off I know I am being very general in my description

I have a 1.4 box and we have been having random asterisk restarts as far back as my logs go.

In digging thru the forums I see a common thread in Eemans posts about the evils of chan_local

I found a thread that said dont use ext based hunt groups, instead use device based huntgroups.

I have used ext based liberally throughout my clients (close to 800 phones total) could this cause random restarts of asterisk under heavy call load?

eeman?


Submitted by choppower on Tue, 04/19/2011 Permalink

I'm running Asterisk 1.4.26.3 and I too am experiencing random restarts. Do your restarts consistently happen at a particular time of the day? I use ext based hunt groups for all of my customers as well and I'm wondering if that is the cause. Have only been getting restarts recently.

Submitted by cbbs70a on Tue, 04/19/2011 Permalink

For what its worth, I've seen this happen with multiple PBX's (crashes, not simply restarts, even though if it crashes, it usually restarts itself so it can be confusing as to what is actually happening). Its always turned out to be the same two culprits every time, either the Digium FFA module or the H323 module. If you don't need the module, just don't load it. I hope this helps.
FSD

Submitted by eeman on Tue, 04/19/2011 Permalink

its really easy to see the cause, by simply using the debugger 'gdb' on the corefile and reading the backtrace. If you start seeing the last application as a macro like tl-ringgroup-base or destination channels that include the word Local/ then you'll know it was a chan_local bug (which there are many) that can cause a crash.

Submitted by IVSCOMM on Tue, 04/19/2011 Permalink

cbbs70a is right I am actually getting a crash not a random restart. I have core files going back to when I started this server. It seems to be happening 1 to 2 times a day on bad days as many as five. Is there a way to debug a core file? or do I have to seek out the wizards of digium?

Submitted by IVSCOMM on Mon, 04/25/2011 Permalink

At long last after a long weekend of upgrading our switch. Installing all the proper stuff so we can evaluate a core dump we found this...

Program terminated with signal 11, Segmentation fault.
#0 0x00e21649 in ast_masq_valetpark_call (chan=0xb4eab278, data=0xb627fed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)

So it seems the problem with our switch crashing is app valetparking.c. I suspect I configured something wrong when installing it.

Could somebody give me a hint at what to look at.

WHAT DOES IT ALL MEAN?

Submitted by IVSCOMM on Mon, 04/25/2011 Permalink

Park Multi

exten => s,1,Macro(tl-set-myvariables)
exten => s,n,Set(TIMEOUT=${ARG1})
exten => s,n,GotoIf($["${TIMEOUT}" != ""]?park)
exten => s,n,Set(TIMEOUT=360)
exten => s,n(park),ValetParkCall(auto|${tenant}|${TIMEOUT}|${MYEXTENSION}|1|from-inside${TL_DASH}${tenant})

UnPark Multi

exten => s,1,ValetUnParkCall(${MACRO_EXTEN:${ARG1}}|${tenant})

Submitted by eeman on Mon, 04/25/2011 Permalink

its a bug in the .c code. I dont remember what triggers it, but if its the cause of all your crashes then thats not good as there is no updated code that fixes it. The worst case I have seen of valetparking is one crash every few weeks. If all your core files debug to valetparking as the culprit then you might have to figure out how to recreate the problem, otherwise you'll have to abandon its use :-(

Submitted by IVSCOMM on Tue, 04/26/2011 Permalink

This is a brand new box with latest version Asterisk (1.4.40) I have two core dumps this is the fault code for yesterday and today's so far...

(

Yesterdays Core Dump
Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0 0x00e21649 in ast_masq_valetpark_call (chan=0xb4eab278, data=0xb627fed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)

Todays Core Dump
Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0 0x00dc0649 in ast_masq_valetpark_call (chan=0xb6937448, data=0xb62f7ed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)

If any one has some ideas. I'd love to hear them. Or am I just screwed with app_valetparking.c

Submitted by eeman on Tue, 04/26/2011 Permalink

i think you're just going to have to abandon the application. The guy who wrote it refuses to maintain it, refuses to even publish simple patches I made to it so that it would compile in > 1.4.26, refuses to even answer my email when I try to contact him.

Submitted by eeman on Tue, 04/26/2011 Permalink

i couldnt tell you without more of the backtrace. One crash was when people were not picking up the parked calls. Maybe set the timeout to 1hr and screw em if they forget to pick up their parked calls ;-)

Submitted by IVSCOMM on Wed, 04/27/2011 Permalink

Still not sure what part of that code was causing our crashes but I removed the scripts so it could not be accessed and sent out an email to not use the feature and so far today, No Crash.

-Knocking Heavily on Wood

Submitted by raven on Wed, 05/11/2011 Permalink

On earlier systems 1.2, 1.4 I have noticed that if there is no hardware card to obtain timing (like from a T1), and/or the dummy (zaptel) was not compiled or selected properly on systems without a hardware card, the system was very prone to crashing and restarts.