Skip to main content

Asterisk Crashing Randomly

Posted by conraddewet on Mon, 08/08/2011

Ok, I have a really annoying problem on one of ou production PBX's....

Thirdlane MTE's Asterisk (1.6.0.6) is crashing every say 5-6 days or so... here what i know:
- I cannot replicate the crash
- When it crashed, all calls are dropped.
- When it crashes the last lines in the CLI are usually -- Called 012345678@TennantsTrunk (but not always, and not on the same trunk)
- There are on average 40 - 50 concurrent channels that get disconnected.
- All 300 or so extensions re-register directly after.
- Its really annoying and causing our clients to complain... its not really a business model we can work on - it seems that if we cant fix this we will essentially be out of business.

Most of what i have read about explains how to "debug" asterisk but really... its just not going to work what this amount of information going through the box.

So my questions are:
Is this just how stable Asterisk is - just accept it - type thing?

Is there any crash reporting (just in time debugging etc etc)?

If we cannot replicate the cause - it leads me to believe its either a "core" issue (something in the code) or something in the way we have set things up, either way, what steps can we take to ensure that Asterisks simply "never" crashes? (or is that just wishful thinking?)

I would also like to hear what other peoples "crash-per-day" rates are?

Be honest now... Is Asterisk really the kind of software we can build a enterprise VoIP business on? should we be looking at something like OpenSER, etc? (I'm just asking... someone has to)


Submitted by conraddewet on Mon, 08/08/2011 Permalink

Ok, so i have the file in the /tmp folder. Its a 60MB file with the file name of the dates that asterisk crashed.
So now what do i read it with. Opening it with a text reader doesn't really show any readable or usable information. Kinda looking for a "asterisk crash here: xxx.c on line x, because y". ;)

(sorry to sound stupid about this - 1st time with this).

Submitted by conraddewet on Mon, 08/08/2011 Permalink

Thanks, I have been reading about it.

Backtracing a core dump file in /tmp

start Asterisk with safe_asterisk
enter "gdb asterisk core.xxxx"
enter "bt" while in gdb (or do a "bt full")
enter "thread apply all bt"
Naturally you'll need to have gdb installed on your system

Submitted by conraddewet on Mon, 08/08/2011 Permalink

Second, your copy of Asterisk must have been built without optimization or the backtrace will be (nearly) unusable. This can be done by selecting the 'DONT_OPTIMIZE' option in the Compiler Flags submenu in the 'make menuselect' tree before building Asterisk.

I used the ISO, so not sure if this is the case?

Submitted by mattdarnell on Mon, 08/08/2011 Permalink

Aloha,

We haven't used the ISO in an MTE environment. It has done us well is STE installations.

Do you know what version of Asterisk you are running? I am not a coder but it seems that crashes are easier to fix than deadlocks.

We are still on the 1.4 branch and have not had a crash/deadlock in over 1 year. Previous versions of 1.4 were very unstable for us.

If I was you, I would hire Erik to take a look at your box and get everything running the latest code, he helped us with our install and it was/is worth every penny.

-Matt

Submitted by conraddewet on Tue, 08/09/2011 Permalink

The version is Asterisk (1.6.0.6). But i think you are right, even if we got to the trouble of gdb-ing the crash dups... so what, it not like i will be able to really do anything about it. I cannot replicate the crash - its random. I'm sure its just a buggy version.

Erik, if you don't mind, i would like to contact you off the forum. Or should i make contact via your BluegrassNet Voice site? Happy to pay for your time.

Submitted by eeman on Tue, 08/09/2011 Permalink

restarting asterisk nightly wont help because its not a case where you have a bucket filled with water, and somehow emptying the bucket nightly avoids overflow. The crashes are event driven. Whatever is causing them is because the right combination of events are causing the crash.

feel free to contact me in the email address at the bottom of my signature. Its written cryptically to avoid spam collectors from grabbing it. If you read it out load you'll see what i mean by written cryptically.