WebRTC is an open source project that provide browsers and mobile applications with Real-Time Communications (RTC) capabilities. Using WebRTC a developers can easily add video and audio communication to a website or mobile application using a set of simple application programming interfaces (APIs).
Most popular browsers and mobile platforms support WebRTC without need for plugin or extra add-ons. Today Chrome, Firefox, Opera, Safari and Microsoft Edge all supports WebRTC.
WebRTC originated from Google in 2010 when Google acquired Global IP Solutions (GIPS) a VoIP and videoconferencing software company known for their media frameworks used for developing VoIP and video calling applications. Google later open-sourced the GIPS. The protocols was standardized in the IETF and the browser APIs in the W3C. It is now called WebRTC
Today the WebRTC initiative is actively supported by Google, Mozilla and Opera and others.
How does it work?
In the past developing and implementing real-time audio and video communication was complex and time consuming and often meant long development cycles and high development costs.
Major components of WebRTC
getUserMedia – used to access the microphone, camera or even the screen of your device
RTCPeerConnection – enables audio and video communication between peers.
RTCDataChannel – allow bidirectional communication of arbitrary data over peer connections
getStats – retrieve a set of statistics about WebRTC sessions
Applications using WebRTC
Some of the biggest companies in the world including Google, Facebook and Amazon has already embraced the technology and implemented into their application. Some of these applications are:
In my last article An introduction to WebRTC I covered the question: “What is WebRTC?”. This article I will cover the basic concept of WebRTC and explain how WebRTC works.
From a users point of view WebRTC works simply like magic, but under the surface WebRTC consist of many moving parts. Some of these parts are:
Signaling in WebRTC
Sending Media Data and Arbitrary Data
WebRTC requires two types of network interactions. Signaling and media. It is assumed that signaling takes place over an HTTPS connection or a websocket and media goes through Media Channels.
Signaling in WebRTC
WebRTC is a fully peer-to-peer (P2P) technology for real time communication. Before peers can communicate with each other exchange of network information and negotiation of media format must take place. WebRTC use the Session Description Protocol (SDP) to exchange these information.
The SDP is assumed to be communicated by the application. This process is called signaling and is not part of WebRTC. This means that you have the freedom to use any technology you desire for building your own signaling server.
The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the existing SIP or Jingle call signaling protocols, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the necessary transport and media configuration information necessary to establish the media plane.
Earlier in the article we learned that peers need to exchange network information before they can communicate with each other. This is known as an ICE candidate and details the available methods the peer is able to communicate (directly or through a TURN server). Each peer will propose its best candidates first.
ICE candidates using UDP are considered best since it is faster and media streams are able to recover from interruptions relatively easily, but the ICE standard allow TCP candidates as well, but not all browsers support ICE over TCP.
There are different types of ICE candidates. The type defines how the data makes its way from peer to peer. The different types are “host”, “prflx”, “srflx”, relay, “active”, “passive” and “so”. Read more about the different types of ICE candidates here.
One of the two peers are serving as the Controlled Agent. The Controlled Agent will make the final decision as to witch candidate pair to use for the connection.
NAT Traversal using STUN and TURN
Often a peer will be located on a private network behind a NAT (Network Address Translator) and will not have a unique public IP address that it can exchange with other peers.
Without a unique public IP address a peer can not communicate with peers on the other side of the NAT. To get around this problem WebRTC uses STUN (Session Traversal Utilities for NAT) to discover the public IP address and port number that the NAT has allocated for the applications UDP flows to remote hosts.
Most WebRTC calls are made successfully using STUN but in some cases a peer can be located behind a firewall or a Symmetric NAT where WebRTC will not be able to make a successful connection using STUN.
In situations where WebRTC will not be able to make a successful connection using STUN it can use a TURN (Traversal Using Relays around NAT) server as fallback to relay video, audio and arbitrary data between peers. A TURN server with a public IP can be contacted by peers even if they are behind firewalls or proxies.
STUN and TURN are widely used tools for communications protocols to detect and traverse NAT’s that are located between communication endpoints.
Sending Audio, Video and Arbitrary Data
WebRTC is build with low latency audio and video communication in mind and uses codecs to compress and decompress audio and video data. WebRTC supports VP8 and H.264 for video and G.711 and Opus for audio. All media data is sent through Secure Real-time Transport Protocol (SRTP).