How Can I Use SIP with WebRTC?
If you’re asking this question, then chances are you either have an existing SIP infrastructure and are looking for a way to interconnect with WebRTC-enabled endpoints, or you have an existing web application and are looking for a way to interconnect with the telephone network.
In both cases, the goal is the same - interconnection - and that’s what SIP is all about.
Before we can answer the main question, though, we need to understand clearly what SIP actually is, and to do that, we need to get a little bit technical.
What is SIP?
The Session Initiation Protocol, or SIP for short, has been around since the 90s. It is a text-based signalling protocol, used to manage media sessions between two IP-connected endpoints. It’s frequently used in Voice-over-IP (VoIP) communications, which has driven a rather significant growth in the scope of what SIP can do. What started out as a relatively simple specification is now a rather large collection of RFCs, proposals, and extensions.
In the end, though, the basics of SIP are quite simple. It’s structure is very similar to HTTP; each SIP message consists of a number of headers, each on its own line, followed by a message body.
The SIP message body uses something called the Session Description Protocol (SDP), which is used in some SIP messages to describe information about the media streams that will eventually flow between the two endpoints, like the number and type of streams (e.g. audio, video) and what encodings are allowed (e.g. G.711, Opus, VP8, H.264).
In a typical call setup, two SDP messages will be sent - an offer and an answer. The call initiator sends the offer first, describing as much as it can about the soon-to-be-established session and the capabilities/desires of the offering side. The call receiver will then (if it accepts the call), send an answer back, including information about its own capabilities/desires within the context of what was offered.
If all goes well, this is enough information for the two sides to form a connection and start sending and receiving data.
What Does SIP Have to Do with WebRTC?
WebRTC is very naturally related to all of this. Like SIP, it is intended to support the creation of media sessions between two IP-connected endpoints. Like SIP, the connections use the Real-time Transport Protocol (RTP) for packets in the media plane once signalling is complete. Like SIP, it uses SDP to describe itself.
Where it differs is in two key areas:
- It does not mandate the use of SIP messages in the signalling plane. In fact, it makes no decision at all regarding the use of SIP or any other signalling protocol. The actual signalling (sending/receiving) of the SDP messages is intentionally left to the application. SIP, on the other hand, defines a very specific message format that must be used to encapsulate and send SDP messages for the purposes of call setup.
- It mandates the use of some SIP-optional features in the media plane. Specifically:
- ... the use of specific codecs. G.711 and Opus are required for audio, whereas VP8 and H.264/AVC are required for video. (Note that technically, web browsers are the only ones required to support both VP8 and H.264, but practically speaking, every WebRTC-enabled endpoint should be looking at both for maximum compatibility and performance.)
- ... the use of the Secure Real-time Transport Protocol (SRTP) profile to provide encryption and message authentication for media packets.
- ... the use of Datagram Transport Layer Security (DTLS) to generate the keys used by SRTP. The DTLS certificate fingerprint(s) must be signalled in the SDP. (Note that the use of Security Descriptions (SDES) was formerly supported, but is no longer allowed.)
- ... the use of Interactive Connectivity Establishment, Session Traversal Utilities for NAT, and Traversal Using Relays around NAT, (ICE, STUN, and TURN) for network traversal.
Putting aside all arguments of whether these differences are good or not, they exist, and so we have to work with them. Let’s take a look at the two independently.
The Signalling Plane
Running on the assumption that the existing SIP infrastructure isn’t going to switch to a different signalling protocol (not a big stretch), the WebRTC side of things is going to have to learn how to talk SIP.
There are two ways to achieve this:
- Use SIP as the signalling stack for your WebRTC-enabled application.
- Use another signalling solution for your WebRTC-enabled application, but add in a signalling gateway to translate between this and SIP.
Which option is better for you depends greatly on your existing infrastructure and your plans to expand.
Do you have an existing SIP infrastructure?
Do you have a selective forwarding unit (SFU) or multipoint control unit (MCU) to help scale out your WebRTC connections?
Does your web application already use a system like WebSync or XMPP for real-time text communications?
On which client platforms does your application run?
Do you have a SIP signalling stack that runs on those platforms?
The right path for you will depend heavily on the answers to these questions, and probably more.
Our professional services team is available to help you with an iRTC Project Assessment that can analyze your current layout and future plans to advise on the best solution for you.
We are currently hard at work making this all significantly easier by adding support for SIP to the gateway included with our upcoming LiveSwitch SFU/MCU.
With support for both WebSync (for web endpoints) and SIP (for telephony endpoints) out of the box, plus the ability to code in support for third-party signalling systems, the LiveSwitch gateway will make signalling interconnection one less problem to worry about.
The Media Plane
If you don’t have an existing SIP infrastructure, then the right choice may be to simply select SIP technology that is listed as being WebRTC-compatible.
Many SIP gateways (e.g. FreeSWITCH) and SIP trunking services (e.g. Voxbone) can be configured to use DTLS/ICE and the codecs mandated by WebRTC.
If you already have an existing SIP infrastructure, then it may be necessary to add in a Session Border Controller (SBC), such as the SBC product range from Sonus, or another similar device that can act as a media gateway between the WebRTC and VoIP endpoints, where the SBC-to-WebRTC leg satisfies the requirements of WebRTC while the SBC-to-VoIP leg caters to the needs of VoIP endpoints.
If you have an SFU/MCU assisting you with scaling out WebRTC connections, then the media server may be able to act as this gateway.
Our upcoming LiveSwitch SFU/MCU is designed to allow direct communications with SIP clients via the MCU component.
Again, the right path for you may require some discussion and an assessment of your needs.
If you're not sure which way to go, let our professional services team be your guide! Contact us for a consult to discuss your RTC needs.