Skip to main content

thirdlane connect webrtc issues

Posted by jredmond on Fri, 08/17/2018

I'm at my wits end with this one...

I'm running the most recent Thirdlane 9.0.2 with Connect 2.0.2.

I have working STUN and TURN servers setup and I've tested them using TrickleICE (https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/) So I know they are passing values for outside IP and relay to the client. One of these is the CoTURN that's installed with the Thirdlane appliance. I'm using default ports.

If I enable or disable the STUN/TURN servers has no effect.
If I drop the iptables firewall has no effect.

The Connect client simply refuses to setup a webrtc connection to the telephony server. So calling is disabled in the connect app. Same for all platforms I've tried; Chrome, Windows, and Android.

However, I can get status updates and check voicemail with audio working for the voicemail.

My Thirdlane is multi-homed single-tenant. I have a public IP on one interface and two LAN interfaces. One LAN connects to the local handsets, the second to the AT&T SIP gateway. Default route is setup to go out the WAN interface with the public IP. I use static routes to handle the LAN subnets.

Everything had been working some time ago before I switched to a Go Daddy cert. This was due to problems getting the previously used Let'sEnrypt cert to renew itself without issues. After beating my head against the wall long enough on that issue I went to a GoDaddy cert. Then Connect quit working. At first I couldn't log in at all due to some leftover config with specifying the intermediate certs that I didn't do at first. Did that and then I could log into Connect. But then the problems with WebRTC cropped up.

I initially had challenges with WebRTC when we first deployed and I was able to resolve them at some point. This time no go.

I get log entries like this in Connect...

[8/17/2018, 4:30:27 PM] [WebRTC Srv] - A sudden disconnect. Attempting to reconnect in 10 seconds

Other log entries are not very usefull and I must say this one isn't offering many clues. Running asterisk -rdddddddddddvvvvvvvvvvvvvvv from bash cli shows a log of entries like
chan_sip.c:4208 __sip_reliable_xmit: Serious Network Trouble; __sip_xmit returns error for pkt data

These go away when I flip my extension back to SIP from WebRTC.

Anything I should try? Or logs you'd like to see?


Submitted by eugene.voityuk on Fri, 08/17/2018 Permalink

Connect uses two different media and signaling path for calls, there is big difference underneath of what is happening when you click "call" or "call with video" and when you click "call (extension, mobile, home office)". Fist type of calls establish p2p connection, and the second type of calls goes via media server. If you have an issue with both types of calls, then it is more likely to be an issue with networking or CoTURN server. I can suggest you educate yourself a bit more and get familiar with webrtc-internals in chrome. Here are some useful links to start from https://testrtc.com/webrtc-api-trace/, https://testrtc.com/webrtc-internals-documentation/. At least you will be able to see something useful: whether it is lack of candidates, or they cant pass ice connectivity checks, or signalling is not working, etc.. Another way, you may contact Alex for the support package, and our support team will deal with the issues you faced, as such troubleshooting usually is beyond of forum QA format, and requires system access.

Submitted by jredmond on Mon, 08/20/2018 Permalink

Your first link I was already down that path and started with the presumption that something had to be wrong with TURN and/or STUN. So I had sought a utility to help me understand if my TURN server (and list of public STUN servers) was at least making replies. I can verify that I'm getting both TURN and STUN resolution. I may not be getting actual TURN relay (i.e. client receives relay candidate but can't use it) and that might be the issue but existing logging tools at my disposal don't yield any clues here. My next steps will include traffic capture to at least try and ascertain if TURN relay is even being attempted.

Specifically the issue is with the connection to the media server. P2P between Connect clients works just fine. Media server connectivity had been working at some point in the past, but not now.

What might be going on here is that Asterisk can't make the switch correctly and it might not have anything to do with TURN signalling. I do have more then one trunk setup and I've contemplated that the WebRTC connection might break when a poorly configured trunk is present. I have the system connected to a Barracuda Phone system setup as a peer which in turn serves about a half dozen extensions. However, all SIP connectivity at the moment is working fine. But, I've read of at least one case of FreePBX users complaining about WebRTC failing when a poorly configured trunk was present. Its a theory I've pondered.

I do have a support package with Thirdlane and had reported these issues via my re-seller. They've been working on the problem as well as I.

Submitted by eugene.voityuk on Mon, 08/20/2018 Permalink

I was just trying to help when you posted to the forum. We were not aware of your problem otherwise or any support arrangements.

Submitted by jredmond on Mon, 08/20/2018 Permalink

Stumbled upon the solution. I had made a change to http.conf to bind it to the LAN interface for AJAM/AMI support for our line of business application to have telephony support. By removing the ability to listen to 127.0.0.1 I broke the WebRTC listener which uses it. Binding it to 0.0.0.0 fixed this and allows both to work while depending on iptables to keep the bad guys out. (Binding to 127.0.0.1 reduces attack surface, but restricts the ability to use AJAM/AMI.)

Submitted by eugene.voityuk on Mon, 08/20/2018 Permalink

I am glad that you have found a solution. 127.0.0.1 is default bindaddr due to security concerns. Thirdlane Connect connects to Nginx, which proxies WebSocket connections to Asterisk, Asterisk sees all connections from Thirdlane Connect as connections from localhost and all peers in 'sip show peer', shows connections from Thirdlane Connect as from localhost (we actually have patched asterisk, to pass through real IP based on Nginx headers information in all places instead of localhost, this will be available on next release). So this is why it is configured to listen only 127.0.0.1. Asterisk has not the best WebSockets implementation and should be always front-ended by some strong proxy. So be careful next time in changes to conf files, and better to ask a question on the forum if you have concerns. Thirdlane platform has grown since introducing Thirdlane Connect, and some internal configuration dependencies might be not obvious. As a summary to this, /var/log/nginx/access.log, is a good starting point if someone tries to troubleshoot connectivity issues in Thirdlane Connect.