The basic concept of WebRTC

In my last article An introduction to WebRTC I covered the question: “What is WebRTC?”. This article I will cover the basic concept of WebRTC and explain how WebRTC works.

From a users point of view WebRTC works simply like magic, but under the surface WebRTC consist of many moving parts. Some of these parts are:

  • Signaling in WebRTC
  • ICE Candidate
  • Sending Media Data and Arbitrary Data

WebRTC requires two types of network interactions. Signaling and media. It is assumed that signaling takes place over an HTTPS connection or a websocket and media goes through Media Channels.

Signaling in WebRTC

WebRTC is a fully peer-to-peer (P2P) technology for real time communication. Before peers can communicate with each other exchange of network information and negotiation of media format must take place. WebRTC use the Session Description Protocol (SDP) to exchange these information.

The SDP is assumed to be communicated by the application. This process is called signaling and is not part of WebRTC. This means that you have the freedom to use any technology you desire for building your own signaling server.

The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the existing SIP or Jingle call signaling protocols, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the necessary transport and media configuration information necessary to establish the media plane.

Section 1.1 – JavaScript Session Establishment Protocol

Establishing a session between caller and callee

  1. The caller creates an offer by calling the createOffer() API.
  2. The caller uses the offer to set up its local config via the setLocalDescription() API and sends it to the callee over the signaling channel.
  3. The callee installs the offer using the setRemoteDescription() API. When the offer is accepted, the callee uses the createAnswer() API to generate an appropriate answer.
  4. The callee applies the answer using the setLocalDescription() API and the answer is send to the caller over the signaling channel.
  5. The caller installs the answer using setRemoteDescription() API, and the initial setup is new complete.

Establishing a session in WebRTC is described in Section 1.1 General Design of JSEP in the
JavaScript Session Establishment Protocol.

ICE Candidate

Earlier in the article we learned that peers need to exchange network information before they can communicate with each other. This is known as an ICE candidate and details the available methods the peer is able to communicate (directly or through a TURN server). Each peer will propose its best candidates first.

ICE candidates using UDP are considered best since it is faster and media streams are able to recover from interruptions relatively easily, but the ICE standard allow TCP candidates as well, but not all browsers support ICE over TCP.

There are different types of ICE candidates. The type defines how the data makes its way from peer to peer. The different types are “host”, “prflx”, “srflx”, relay, “active”, “passive” and “so”. Read more about the different types of ICE candidates here.

One of the two peers are serving as the Controlled Agent. The Controlled Agent will make the final decision as to witch candidate pair to use for the connection.

NAT Traversal using STUN and TURN

Often a peer will be located on a private network behind a NAT (Network Address Translator) and will not have a unique public IP address that it can exchange with other peers.

Without a unique public IP address a peer can not communicate with peers on the other side of the NAT. To get around this problem WebRTC uses STUN (Session Traversal Utilities for NAT) to discover the public IP address and port number that the NAT has allocated for the applications UDP flows to remote hosts.

Most WebRTC calls are made successfully using STUN but in some cases a peer can be located behind a firewall or a Symmetric NAT where WebRTC will not be able to make a successful connection using STUN.

In situations where WebRTC will not be able to make a successful connection using STUN it can use a TURN (Traversal Using Relays around NAT) server as fallback to relay video, audio and arbitrary data between peers. A TURN server with a public IP can be contacted by peers even if they are behind firewalls or proxies.

STUN and TURN are widely used tools for communications protocols to detect and traverse NAT’s that are located between communication endpoints.

Sending Audio, Video and Arbitrary Data

WebRTC is build with low latency audio and video communication in mind and uses codecs to compress and decompress audio and video data. WebRTC supports VP8 and H.264 for video and G.711 and Opus for audio. All media data is sent through Secure Real-time Transport Protocol (SRTP).