Skip to main content

Trunkout timeout on failure.

Posted by charlieboisseau on Tue, 09/22/2009

Hello,

Yesterday, our provider had a power cut, but left only one gateway on. This resulted on an overload of their server.

The practical situation is that when a call was trying to be set from our side, no response was given from them.Our last SIP message was: SIP/100. After ,lets say , 1 minute, we did get a 503 error, and then internally, we failed over our second trunk. The problem obviously comes that in order for this to happen the caller needs to wait around 60 secs for the call to be completed.

I have looked at tl-dialout-base and it seems that the only TIMEOUT option present is in the Dial() command, wich obviously does not help me.

I was wondering if there is any way to detect, or decide that after, lets say 8 seconds maximum, if no response has been received from the first trunk, fail to the second one. Instead of having to wait for an eventual error from our provider.

Thank you very much for your time.


Submitted by eeman on Tue, 09/22/2009 Permalink

thats what qualify=value in the sip settings do for you. They send messages and time the latency of response. If that duration exceeds the set value in milliseconds, it marks that channel as unavailable.

Submitted by eeman on Tue, 09/22/2009 Permalink

Asterisk sip qualify

SIP.conf: device configuration - qualify

Syntax:

qualify=xxx|no|yes

where XXX is the number of milliseconds used. If yes the default timeout is used, 2 seconds.

If you turn on qualify in the configuration of a SIP device in sip.conf, Asterisk will send a SIP OPTIONS command regularly to check that the device is still online. If the device does not answer within the configured (or default) period (in ms) Asterisk considers the device off-line for future calls. This status can be checked by the SIPPEER function, and inversely this function will only provide status information for peers which have qualify=yes.

This feature may also be used to keep a UDP session open to a device that is located behind a network address translator (NAT). By sending the OPTIONS request, the UDP port binding in the NAT (on the outside address of the NAT/firewall device) is maintained by sending traffic through it. If the binding were to expire, there would be no way for Asterisk to initiate a call to the SIP device. This can be used in conjunction with the nat=yes setting.

By default chan_sip.c sends the qualify every 60 seconds. At least in 1.6.0 you can change this value with qualifyfreq. The value in qualfiy = represents the timeout after a packet is sent before we consider the peer to be unreachable. If the packet is not responded within 1 second, asterisk will keep trying until 7 packets have failed. At this point, asterisk won't try again until the next 60 cycle period completes. If a packet is lost, which can easily happen with UDP, there are 7 more packets which are transmitted. Additionally asterisk will keep trying every 60 seconds. So even if all 7 packets are lost, asterisk tries again at the next 60 second cycle. The number of retransmits and time between each qualify is defined in chan_sip.c,

2008-08-27 - In v1.4 (SVN only) "qualify=yes" is ignored if the peer is realtime and caching is not turned on. See http://bugs.digium.com/view.php?id=13383.

Submitted by charlieboisseau on Tue, 09/22/2009 Permalink

Thank you very much for your prompt reponse Erik.

As I had qualify=yes on the first trunk, the default timeout is meant to be 2 seconds.

In yesterday' s scenario, 8 packets lost would be 16 secs of delay, before asterisk deciding that the channel is down. What I think is that our provider was responding to those SIP options packet, as it was only going to our failover trunk after a while (between 1 and 2 mins) . However without a proper sip trace I cannot be sure this was happening or not.

Anyway, Thank you for your explanation, which makes things a little bit more clear.

Marc