An introduction to WebRTC

What is WebRTC?

WebRTC is an open source project that provide browsers and mobile applications with Real-Time Communications (RTC) capabilities. Using WebRTC a developers can easily add video and audio communication to a website or mobile application using a set of simple application programming interfaces (APIs).

Most popular browsers and mobile platforms support WebRTC without need for plugin or extra add-ons. Today Chrome, Firefox, Opera, Safari and Microsoft Edge all supports WebRTC.


WebRTC originated from Google in 2010 when Google acquired Global IP Solutions (GIPS) a VoIP and videoconferencing software company known for their media frameworks used for developing VoIP and video calling applications. Google later open-sourced the GIPS. The protocols was standardized in the IETF and the browser APIs in the W3C. It is now called WebRTC

Today the WebRTC initiative is actively supported by Google, Mozilla and Opera and others.

How does it work?

In the past developing and implementing real-time audio and video communication was complex and time consuming and often meant long development cycles and high development costs.

Today a developer can implement real time audio and video communication capabilities to a website or a mobile application in a peer-to-peer manner using a JavaScript API. This makes it easier to develop and integrate real time communication on your website or mobile application.

WebRTC’s JavaScript API allow your application or website to access the microphone, camera or even the screen of your device and share it remotely. The only limit is your imagination.

Major components of WebRTC

WebRTC include several JavaScript APIs:

  • getUserMedia – used to access the microphone, camera or even the screen of your device
  • RTCPeerConnection – enables audio and video communication between peers.
  • RTCDataChannel – allow bidirectional communication of arbitrary data over peer connections
  • getStats – retrieve a set of statistics about WebRTC sessions

Applications using WebRTC

Some of the biggest companies in the world including Google, Facebook and Amazon has already embraced the technology and implemented into their application. Some of these applications are:

  • Google Hangout
  • Facebook Messenger
  • Amazon Chime

The basic concept of WebRTC

In my last article An introduction to WebRTC I covered the question: “What is WebRTC?”. This article I will cover the basic concept of WebRTC and explain how WebRTC works.

From a users point of view WebRTC works simply like magic, but under the surface WebRTC consist of many moving parts. Some of these parts are:

  • Signaling in WebRTC
  • ICE Candidate
  • Sending Media Data and Arbitrary Data

WebRTC requires two types of network interactions. Signaling and media. It is assumed that signaling takes place over an HTTPS connection or a websocket and media goes through Media Channels.

Signaling in WebRTC

WebRTC is a fully peer-to-peer (P2P) technology for real time communication. Before peers can communicate with each other exchange of network information and negotiation of media format must take place. WebRTC use the Session Description Protocol (SDP) to exchange these information.

The SDP is assumed to be communicated by the application. This process is called signaling and is not part of WebRTC. This means that you have the freedom to use any technology you desire for building your own signaling server.

The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the existing SIP or Jingle call signaling protocols, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the necessary transport and media configuration information necessary to establish the media plane.

Section 1.1 – JavaScript Session Establishment Protocol

Establishing a session between caller and callee

  1. The caller creates an offer by calling the createOffer() API.
  2. The caller uses the offer to set up its local config via the setLocalDescription() API and sends it to the callee over the signaling channel.
  3. The callee installs the offer using the setRemoteDescription() API. When the offer is accepted, the callee uses the createAnswer() API to generate an appropriate answer.
  4. The callee applies the answer using the setLocalDescription() API and the answer is send to the caller over the signaling channel.
  5. The caller installs the answer using setRemoteDescription() API, and the initial setup is new complete.

Establishing a session in WebRTC is described in Section 1.1 General Design of JSEP in the
JavaScript Session Establishment Protocol.

ICE Candidate

Earlier in the article we learned that peers need to exchange network information before they can communicate with each other. This is known as an ICE candidate and details the available methods the peer is able to communicate (directly or through a TURN server). Each peer will propose its best candidates first.

ICE candidates using UDP are considered best since it is faster and media streams are able to recover from interruptions relatively easily, but the ICE standard allow TCP candidates as well, but not all browsers support ICE over TCP.

There are different types of ICE candidates. The type defines how the data makes its way from peer to peer. The different types are “host”, “prflx”, “srflx”, relay, “active”, “passive” and “so”. Read more about the different types of ICE candidates here.

One of the two peers are serving as the Controlled Agent. The Controlled Agent will make the final decision as to witch candidate pair to use for the connection.

NAT Traversal using STUN and TURN

Often a peer will be located on a private network behind a NAT (Network Address Translator) and will not have a unique public IP address that it can exchange with other peers.

Without a unique public IP address a peer can not communicate with peers on the other side of the NAT. To get around this problem WebRTC uses STUN (Session Traversal Utilities for NAT) to discover the public IP address and port number that the NAT has allocated for the applications UDP flows to remote hosts.

Most WebRTC calls are made successfully using STUN but in some cases a peer can be located behind a firewall or a Symmetric NAT where WebRTC will not be able to make a successful connection using STUN.

In situations where WebRTC will not be able to make a successful connection using STUN it can use a TURN (Traversal Using Relays around NAT) server as fallback to relay video, audio and arbitrary data between peers. A TURN server with a public IP can be contacted by peers even if they are behind firewalls or proxies.

STUN and TURN are widely used tools for communications protocols to detect and traverse NAT’s that are located between communication endpoints.

Sending Audio, Video and Arbitrary Data

WebRTC is build with low latency audio and video communication in mind and uses codecs to compress and decompress audio and video data. WebRTC supports VP8 and H.264 for video and G.711 and Opus for audio. All media data is sent through Secure Real-time Transport Protocol (SRTP).