One of the more common questions I’ve gotten over the last year or so has been now that WebRTC is available in major browsers, isn’t SIP going to go away?
Firstly, it’s important to differentiate between the common usages of SIP:
-
Access Layer (between SIP endpoints like softphones and desk phones)
-
Internal Network (within a service provider’s network)
-
Interconnects / Federation (between service providers)
As WebRTC is really aimed at the access layer, let’s rephrase this question to: Is WebRTC going to replace SIP at the access layer?
What is SIP?
Before trying to answer this surprisingly complex question, let’s define what SIP is for the sake of this discussion: SIP is a signalling protocol which defines a mechanism for setting up offer/answer sessions – phone calls – using SDP (along with some other stuff, but let’s ignore that for now). It’s mostly transport agnostic, defining UDP, TCP, and TLS based transports in the main protocol specification (see RFC 3261) and runs over SCTP (see RFC 4168) and even some attempts at DTLS (see draft-jennings-sip-dtls-05).
What is WebRTC?
Unlike SIP, WebRTC is a browser (javascript) API for interacting with a local media stack, defined by W3C. It doesn’t provide any transport mechanisms, nor does it have the concept of a phone call, or a way of registering to a network. It simply provides the APIs for establishing media streams (audio, video, etc) via SDP offer/answers. There is no specification for how the browser is informed there is a phone call, or how the SDP offer/answers are delivered into the javascript application. How the javascript application calls these APIs based on external events is not defined by WebRTC.
Along with WebRTC, browsers also now support Web Sockets. For security reasons, javascript applications loaded from untrusted locations (like the internet) are not allowed to establish raw TCP connections, or send/receive UDP packets. Until websockets, the only standard method of getting data into or out of a browser was via HTTP, which doesn’t support bidirectional communication (although long polling provided a workaround, albeit inefficiently). Web Sockets provide a mechanism for untrusted javascript applications to open a bidirectional TCP/TLS socket between a browser application and a remote server using a standardized API. However, they don’t let the application read and write raw data to the socket, but include inside a higher level websocket frame which is implemented by websocket servers, instead.
The Problem with SIP
So at this point we have a browser with a media API (and offer/answer semantics), and the potential for creating a bidirectional socket to a server. But it’s missing a critical part to making/receiving voice/video calls: the protocol – the messages – sent between the server and the WebRTC enabled client.
SIP came about – as is the case with most standardized protocols – out of a need to let different entities implement various parts of a system and have a common and well defined mechanism for them to communicate with each other. The protocol allows any SIP-enabled device to work with a service provider that supports SIP (well, almost…that’s the idea anyway) because they both speak and understand the same protocol. They know what each message means and how that maps to what’s happening with a phone call. Without that, you wouldn’t be able to take a Polycom, Cisco, Yealink, Panasonic, or any other SIP-compatible phone and make it work with the Jive Cloud platform unless the phone manufacturer had implemented whatever proprietary protocol we invented – or Jive had implemented the manufacturer’s proprietary protocol they had invented.
It is important to note that this is because Jive’s code runs on Jive’s servers, and Polycom’s code runs on Polycom’s phones. Jive doesn’t have any way to run code we write on Polycom phones, and Polycom doesn’t on Jive’s Platform. The same is true with any generic VoIP (SIP) applications on your laptop, desktop, or mobile phone.
There are many times we have wished we could have! Every vendor does things slightly differently, and we’re really at the mercy of the phone vendor when we discover bugs in how they implement SIP. It often involves multiple months of waiting for a new version of the firmware, which we then have to deploy out to customers after extensive testing. It’s not a quick turnaround.
Additionally, there are many features we’d love to offer customers which we can’t because the phone simply doesn’t support them. We can’t make a phone from vendor X do custom phone feature Y unless there is a defined protocol mechanism for doing it. Some vendors implement some custom functionality and protocols (or extensions to SIP), but that means the feature is only available to customers who purchased that specific phone model/version. In short, Jive can only innovate to a certain point without requiring vendor specific enhancements. Sadly, there are no strong business drivers for the phone vendors to implement hundreds of small features for a single service provider.
The Advantage of WebRTC
Here is where WebRTC and websockets are exciting: WebRTC is a javascript application, which runs untrusted code from any source. So when you go to a URL, the javascript application has access to these APIs. The logic for how the web page looks, the communication protocol, and the reliability, redundancy, and security are all implemented by the code downloaded from the service provider when you go to a web page. How messages sent/received over the websocket map to phone calls, putting people on hold, transferring them, etc., is also all up to that javascript downloaded from the remote web page. Updating the firmware on your web client phone is just as simple as reloading the web page. No more “well, that’s a bug in your phone – we’ll work with the vendor to fix it.” Instead, the service provider can test all the use cases specific to their network before deploying.
Today, SIP is commonly used as the protocol run over websockets and the offers/answers fed into the WebRTC API to negotiate audio/video streams. This is mostly because phone providers already have a publically accessible, scalable, secure, and reliable SIP infrastructure – so adding another transport layer (websockets) on the top is pretty simple. Also, there are a number of open source SIP libraries available, making deploying a SIP-enabled WebRTC client quick and easy.
So, is SIP Going Away?
Over the longer term, will people carry on using SIP as the signalling protocol over websockets when doing WebRTC? Who knows, and frankly, it’s a totally irrelevant question when it comes to a customer interacting with their service provider’s web client.
The service provider (like Jive) writes the code that runs on the client and the server, so what protocol they use ultimately makes no difference. I’m sure some will use SIP, and some will use their own protocols, especially in more traditional telcos where they don’t do any internal development and instead outsource everything to the likes of Ericsson, Cisco, Alcatel Lucent, Nokia Networks, Acme Packet, etc. I believe there will continue to be strong drivers from the telcos to use a standard protocol so there isn’t a vendor lock in.
Not to mention that there is no end in sight of customers still wanting phones on their desk, as well as the incredible number already deployed over the last 10 years which aren’t going away anytime soon. While it would be great if phones started to support WebRTC in a Javascript runtime along with an API for accessing the screen, buttons, handsets, and speakers so that we could implement the client ourselves and innovate at a faster rate, every time i talk to them about it they stare blankly at me. I have to remind myself that these guys are mostly from an old school embedded hardware/firmware world.
To enable innovation, Service Providers will continue to offer basic SIP signalling. Without it, it won’t be possible to take a third-party voice/video application and start using it with your service provider – as each service provider would need custom support added into the application. Without the standardized protocol like SIP, you wouldn’t be able to take an off-the-shelf product like a door entry phone and make it work with something that isn’t written by the same vendor as the phone itself.
It would be nice to think in a few years, after we’ve had more experience with deploying WebRTC that some better alternatives will come around and be as commonly implemented as SIP. I would welcome the change, as it will enable us to use off the shelf servers and client libraries to take care of many of the complexities of asynchronous communications protocols. But we’re years away from that so far, and we’ve already got SIP.
I also don’t see too many uses cases right now where there needs to be third party integration from one web app to another to interact with different networks/federations. There is going to be a lot of work to do here to allow such federation/integration.
Conclusion
So, back to the question: is SIP going away between code running in web browsers and the service provider’s network? Perhaps, because each provider can do their own thing without regard to anyone else. SIP is a beast, and there are only a handful of great software engineers in the world that truly understand its subtleties (some of them work at Jive, and – if you’re interested – take a look at http://jive.com/careers/). But for now, I think SIP is here to stay.