Abstract

WebRTC

Summary

Application ranges of WebRTC in contrast to Skype, Facetime or Flash. Technology overview, current state, necessary infrastructure and potential problems in the heterogeneous internet. melavi as WebRTC demonstration.

Tools/Technology

Programming: Javascript, jQuery
Application: Integration of video and real-time data into Web applications
WEB: HTML5, HTTP, WebSockets
Technologie: WebRTC, Streaming, Peer to Peer

Table of contents

WebRTC demonstration - melavi
WebRTC applications
What is new with WebRTC?
Technology
Preconditions
Summary

WebRTC, applications, technology, infrastructure, demonstration melavi

WebRTC demonstration - melavi

Before we start - a short demonstration. As the proverb says, a picture is worth a thousand words. Click at the following image which links to the MediaLan demonstration video chat melavi. Before trying to get connected to a partner please read the help link of the chat which provides information about the preconditions for its operation. melavi is a by-product of a WebRTC customer integration project. Its purpose is the test of WebRTC functions under the conditions of the Internet to derive recommendations for the operational requirements.

WebRTC applications

When looking at the Web presentations of product suppliers one experiences a growing trend to embedd tools for personal support and service into the business logics of Web applications. The customer is offered a dialogue where he may ask his questions. He gets answers from a personal consultant. On the one hand this relieves a buying decision on the other hand a good personal service is remembered. This human way of communication stands in contrast to exchangeable, impersonal Web presentations of complex products. The "conversation" with a machine is again replaced by a conversation with a human being. The direct dialogue between human beings is revived and gets more and more recognized as a key tool for sustainable customer retention. This stands in contrast to the stereotypical impersonal information presentations of the current internet.

If even a simple text chat has such positive effects how much more benefit could be generated by a face to face communication? Suddenly the company where one buys is not longer just an abstract Internet page but gets a face and even a personal contact with a name. WebRTC (Web Real Time Communication) creates the technical preconditions for such situations.

Let's take other examples - the sensitive dialogue between a doctor and a patient or between a lawyer and its clients. Not each concern of a patient or client requires a direct presence at the doctor or the lawyer. Lots of bagatelles of the daily live may be handled much more efficiently with an online consultation with much lower burden for both sides. Long times in waiting rooms are reduced. The disposability of the doctor or the lawyer is increased. It is possible to serve more clients in the available time. The indirect online meeting can be the pre-stage for the decision whether a direct consultation is really necessary. Text chats or telephone calls are just compromises. Mostly they lack the necessary intimacy of a face to face dialogue. Especially with sensitive topics the personal aspect of a video connection is a substantial factor to establich the necessary trustful relation.

The diversity of potential applications, which base on the interactive personal face to face dialogue is huge. In banking the classical subsidiary structure will be disused in short time. Customer age-groups grow up which have never seen a bank from within. What is to replace the conversation between customer consultant and investor or borrower - virtual WebRTC subsidiaries? Or let's take the communication between a tax consultant and his clients. Lots of situations of everyday life which provide consultancy services are potential domains for WebRTC applications, e. g. the contact of an insurance broker and his customers. Another possibility with interesting perspectives is the application as sales tool for direct marketing purposes. WebRTC could be used for the visual presentation of products or operational instructions, the advice for customers how to apply a product or even for an extended acquisition of customer profile data. Technically it would be possible to take snapshots of a customer during a session and link them with the customer database of the provider - of course only if the client agrees with that.

WebRTC represents the fundament for the foundation and operation of autonomous communities which need an audio visual component as extension for the previous text based communication. The offerer of a service may become a self-managed provider. Interactive language courses or other online trainings are imaginable. WebRTC has the potential to create completely new application domains. Especially the direct combination of the video component with the business logics and data of a supplier is the key to that.

What is new with WebRTC?

What is different with WebRTC compared to Microsoft Skype, Apple Facetime or Adobe Flash. Is it just a further alternative way to transmit audio and video? Somewhat provocatively one could ask as well - what made the difference between Teletex/Btx and the Web in the 90th? In the very beginning of the Web it was possible to access textual information via both ways. Today no one discusses longer the qualitative difference seriously.

The qualitative difference and henceforth the potential of WebRTC compared to classical multimedia- and video chat solutions is well-founded with the following arguments:

WebRTC is a well-defined standard

The standardization by IETF and W3C creates the precondition for a compatible cross-platform communication and for the bridging into other communication networks like VoIP. Interactive live audio and video communication combined with other real-time data becomes a world wide standard which is embedded into the flexible and powerful infrastructure of Web browsers. WebRTC is the very first live video and real-time data communication technique which has the potential of being homogeneously available on all imaginable terminal devices. It will offer communication possibilities without platform barriers.

WebRTC is completely adaptable for own requirements

In relation to possible new applications this point is the most interesting compared to previous technologies which just offer very limited possibilities for the realization of own ideas and solutions.

A part of the standard is a well-defined Javascript interface. It allows the integration of WebRTC into own applications. The break between business logics and stand-alone communication software and hardware vanishes. It is not longer necessary to start a separate Skype client to have a dialogue with a customer. The customer data which is needed for the establishment of connections may be provided directly by the Web application. The dialogue function becomes an integral part of Web applications. Altogether one gets a smooth, universal and homogeneous usability of a Web application with integrated real-time data components.

It is possible to create completely independent solutions for the embedding of the dialog function into own portals. This starts with the implementation of the corporate design of a Web presentation and is complemented with the combination of video data with the business data of the solution. The combination of live video with manifold additional data generates added value. It is possible to exchange real-time measurement values, sensor data, stock ticker and bank data directly between the participants of a WebRTC session and to visualize that data with the broad range of layout tools of a WebRTC browser.

The customizability allows the reproduction of typical business processes in the Web browser. An example is the simulation of waiting room situations in online consultations. As in real life the online waiting room has a state like "opened" or "closed" and a list of clients which wait for a meeting with their consultant like a lawyer. The administration of the necessary information for the control of the establishment of consultant-client dialogues may be adapted completely to the requirements of the individual service provider and its business model. The parameters of these consultations like duration, meeting minutes or the start and end time may be automatically registered and combined with other business data. Compared to inflexible solutions like Skype or Facetime this represents a huge gain in flexibility and interactivity.

Based on the WebRTC standards and integration interfaces it is possible to establish autonomous communication networks. It is possible to restrict the communication to certain user groups. Additionally an user dependent individualization of the operation and presentation may be realized. If required each WebRTC service may build up his own Internet infrastructure and may become a provider with own resources. This offers the highest possible degree for the protection of communication data and flexibility to realize own solutions.

WebRTC data is safe!

Do you really trust in Skype, Flash or Facetime?

All WebRTC data like video, audio, real-time, business or signaling data is encrypted by standardized publicly approved methods. This allows the usage of WebRTC for applications which are highly confidential. Especially that function is the precondition to get WebRTC solutions certified for the application in sensitive scenarios like the doctor/patient or the lawyer/client communication. Other communication techniques use proprietary technologies and rely on non-transparency in relation to confidential data transmissions. This makes their application for the fields discussed at least questionable.

WebRTC is cross-platform compatible

As Web standard WebRTC is potentially usable on any computer, tablet or smartphone if it offers a Web browser which implements the WebRTC standard. Already today there is a huge number of internet devices which are accessible for a WebRTC based video and real-time communication. The number of Web browsers which support WebRTC grows permanently and henceforth the global distribution of the technology on different device types. In the meantime even Microsoft seems to be willing to jump on the bandwagon. But of course on Microsoft Windows there are already alternative browsers with Google Chrome or Mozilla Firefox which are quite mature in relation to WebRTC.

WebRTC does not need plugins

The function is provided directly from the actual Web browsers. It is not necessary to install any additional software. It is not necessary to care permanently for updates of plugins or even autonomous software clients like Skype which may introduce security risks. As a genuine browser technology WebRTC applications may use the broad range of Web technologies like CSS, Javascript or HTML5 to individually adapt a solution to own requirements. The platform-binding of technologies like ActiveX and Internet Explorer is no limitation for WebRTC. One gets homogeneous cross-browser solutions which even may be implemented and operated much simpler as previous technologies. The complete Browser side WebRTC development just requires Javascript. Compared to that the embedding of a Flash based video chat is quite complex and requires the bridging between the Adobe ActionScript and Browser Javascript world. WebRTC even reduces development costs without compromises in flexibility.

WebRTC scales with the requirements

WebRTC represents a so-called peer to peer technology. The data is exchanged directly between the browsers of the meeting partners. A centralized server for the exchange of the real-time data is not necessary. Henceforth there is no bottleneck caused by the limited performance of a server. The number of participants in a WebRTC solution is not limited by centralized resources.

Peer to peer communication is not new compared for example with Skype. Nevertheless WebRTC offers advantages even here compared to proprietary approaches. For example it uses a broad range of standardized technologies from the VoIP sector. This simplifies a seamless communication between WebRTC browser applications and VoIP devices. Conversations between browser and video- or soft-phones or even classical PSTN (Public Switched Telephone Network) subscribers are possible.

Technology

WebRTC is an umbrella term for the interaction of a broad palette of different communication technologies. The following image shows that (incompletely). Parts of the technological base of WebRTC overlap with other standards like the application of the RTP transport protocol or the optional usage of the SIP protocol for signaling purposes which are used as well in the VoIP domain. Other parts of the standard are specific for WebRTC for example the Javascript programming interface of the W3C.

Some components of a WebRTC solution are not defined intentionally. The yellow blocks in the image show that. This is to increase the flexbility for possible application scenarios.

technologies

A big part of the building blocks of WebRTC is encapsulated by the browser. The Web application developer does not have direct influence on the audio/video compression or the streaming transport. Topics like NAT (Network Address Translation) or encryption are as well part of the browser implementation. The application developer may concentrate on the workflows of his Web application. The biggest challenges for him are the filling of the "standard-gap" signaling and the permanent struggle with browser dependent incompatibilities. At least the last point gets reduced with the increasing maturity of the standard.

Audio/Video codecs

The WebRTC standard does not define the methods for audio/video compression. It is just defined how the participants of a WebRTC session inform each other about the methods they support. For that the SDP (Session Description Protocol) is used which is as well a component of VoIP. Despite the fact that the standard does not demand for explicit codecs the browser manufacturers realize de facto standards with their implementations. There is no degree of freedom for the WebRTC application developer in that relation because he must use the methods which are supported by the browsers.

Currently the VP8 video codec has the broadest propagation because Google which drives the WebRTC standardization process uses it. A reason for Googles decision against the H.264 codec are license problems. With homogeneous WebRTC applications the limitation to VP8 is no problem. Problems arise if bridges to other applications or networks like VoIP are necessary. The codec is one reason why a direct communication between WebRTC applications an VoIP devices is problematic. VoIP mostly uses H.264 or even the old H.263 standards. To bridge the gap a centralized transcoding gateway would be necessary. But that would consume a big share of the WebRTC advantages and its peer to peer communication approach.

Like everything with WebRTC the current development is quite dynamic. For example Mozilla has recently announced the alternative support of H.264 and VP8 in Firefox "Support of H.264 with Version 33". Also the newest Microsoft statements let us expect that the Internet Explorer will eventually support WebRTC - but just with H.264 which would prevent a cross-browser communication between Google Chrome and Microsoft Internet Explorer. There is still much politics going on until the standards will enter calmer waters.

Media streaming

As transport methods for audio/video streaming the WebRTC standard defines RTP/RTCP like with VoIP. Because the standard demands for encrypted transmissions the sRTP protocol is used.

Encryption

The standard defines the encryption of audio, video and application data transports. The methods to exchange session encryption keys between the browsers are defined as well.

Signaling

Probably the most confusing part of WebRTC for the application developer is the topic signaling respectively its absence in the specification. The access to the WebRTC functionality of the browser with the Javascript API is a relatively simple affair. The common browser hello world examples show that with just some lines of Javascript code. With that it is kept under cover that a real WebRTC communication is much more demanding and that it causes quite high implementation and infrastructure costs to get it running for non hello world scenarios.

Except for special situations (e. g. the peer to peer communication between two browsers which reside in an intranet with well-known IP addresses of the participants) WebRTC requires the so-called signaling.

The signaling's task is to provide the peers of a WebRTC communication with the information which is necessary to establish the actual peer to peer payload data connection. So that two WebRTC clients can create a peer to peer connection they need information about:

their addresses (so-called ICE-candidates)
about the media data formats which are supported on both sides - the codec information (via SDP)

signaling

This data may not be exchanged directly between the browsers because they simply do normally not know their mutual addresses. Reason is the fact that the majority of Internet participants do neither have a:

publicly accessible nor a
static IP address.

Because of the limitation of the IP address range IP addresses for Internet participants are normally assigned dynamically by their providers. There is no guarantee that an Internet subscriber gets the same IP address on successive connections from its provider. Even worse is the widespread application of so-called NAT (Network Address Translation) technologies in the Internet access routers of participants. The router still has a public IP address from the provider. But all participants behind the router reside in an decoupled Intranet which just knows privat IP addresses. It requires additional efforts, tools and infrastructure to get two Internet participants which reside behind different NAT routers directly connected as it is necessary for a peer to peer communication.

For establishing peer to peer connections nevertheless it is necessary to exchange signaling data via a centralized signaling server. This server has a well known IP address and is henceforth directly accessible for all WebRTC clients. It allows the clients to exchange the relevant data as a precondition to establish the real peer to peer call. After that normally the signaling server is not longer necessary except if it provides additional centralized functionality like HTTP Web Server functions.

Despite the peer to peer character of WebRTC a centralized server infrastructure is necessary for the exchange of connection data. The standard just defines the "What" - the information which must be exchanged - but not the "How" - the transport channel.

signaling server

There are different technologies and protocols which may be used for implementing the transport of the WebRTC signaling data. One of the most frequently used solutions is currently the application of a proprietary transport based on the new HTML5 WebSockets. Alternatives are SIP over WebSockets or AJAX/XHR based approaches - for Web servers which do not support WebSockes like Apache or Internet Information Server.

NAT Traversal

As explained the today's architecture of the internet introduces a number of challenges for WebRTC which obstruct the direct peer to peer communication between two clients or even make it impossible. NAT is one of the most serious architectural problems of the IPv4 based Internet. Despite the fact that it offers ingenious tools to solve the problem of the address space limitation of IPv4 it is actually just a "workaround". In the long term perspective IPv6 will solve the problems at the very root. But until that the following infrastructure is required to get WebRTC reasonably running. These technologies have evolved as part of the VoIP infrastructure.

For the identification of the address information which may be used for the establishment of direct connections STUN (Session Traversal Utilities for NAT) is used. This requires the hosting of an additional public Internet server as part of a WebRTC solution. There are NAT types where even the STUN approach fails (symmetric NAT). In those rare (statistics tell numbers between 10-15%) cases the last fall back is a TURN (Traversal Using Relays Around NAT) server. TURN uses a relayed transport which routes the payload data between the clients through the TURN server. Of course that has negative consequences for the scalability (limited performance of centralized resources) and latency of the real-time data transmissions. Which type of transport (relayed or peer to peer) is used is decided by the browser. For that it uses a try and error method (ICE - Interactive Connection Establishment).

The explanations show that the requirements for the infrastructure of a WebRTC community are relatively high if one wants more then just experimenting with the technology. Additionally to the Web server for the Web application a signaling, a STUN and a TURN server are necessary. A complete WebRTC infrastructure is shown in the following picture:

NAT traversal

It is possible to combine STUN and TURN into one server or to run them separated depending on scalability requirements. If a bridge into the VoIP net is required additional infrastructure components like media and protocol transcoder gateways must be added to the list. Because of the relatively high complexity of a self hosted WebRTC application currently a certain growth of WebRTC providers may be seen which provide that infrastructure and offer WebRTC as Saas/IaaS (Software / Infrastructure as a Services) business models.

WebRTC API

Playgrounds of the WebRTC application developer are the implementation of a signaling solution and the WebRTC Javascript API. This API standardizes three Javascript objects and their methods:

getUserMedia realizes the access to the local audio and video sources of a WebRTC client
RTCPeerConnection encapsulates the establishment of a WebRTC connection to remote WebRTC clients. The object represents the actual WebRTC interface of the browser and is the most complex part of the API. Central tasks are the identification of address information for the connection establishment and the control of the peer to peer streaming connections. For that it implements internally STUN/TURN and ICE mechanisms and accesses and controls the media codecs.
RTCDataChannel allows the exchange of any binary or text payload data between the WebRTC clients.

The degrees of freedom for the application developer to control the WebRTC base of the browser are still quite modest. Except the control of the video resolution there are no further quality parameters accessible to influence the streaming. Complex communication aspects like encryption, bandwidth control or congestion control are completely handled under the hood. Nevertheless the Javascript API combined with other browser resources offers an astonishing broad range of customization possibilities.

WebRTC – current preconditions

WebRTC works best with the newest versions of Google Chrome, Mozilla Firefox and Opera. With the current dynamics of the development one should use the browsers always with the newest versions available. With Google Chrome it is possible to establish WebRTC connections to mobile Android devices. The Microsoft Internet Explorer does not yet support WebRTC. At least under Windows that is not a serious obstacle for the introduction of WebRTC because there are sufficient alternatives. The WebRTC connection to iOS based devices is currenty very limited. There are providers like TokBox which offer native iOS Apps with WebRTC support. Maybe a further brand new alternative is Ericssons OpenWebRTC initiative and the belonging browser "bowser". Apples Safari browser does not support WebRTC.

For a first test whether the own platform is fit for WebRTC one can use the Google reference application:

apprtc.appspot.com

It uses the WebRTC signaling and NAT-traversal infrastructure of Google.

A cross-browser communication for example between Chrome and Firefox works principally. But the behaviour of the browsers is different in some details which may obstruct high quality conversations. For the communication between the newest versions of Chrome and Firefox the author experienced severe quality problems which do not happen in homogeneous Chrome to Chrome sessions. There have been increasing disturbances over the time. The latency increases, you may have audio drop outs and jerking video. To avoid disappointment when starting with that fascinating technology one should still avoid cross-browser sessions if possible. With the advance of the standard that situation will hopefully get better.

Summary

Despite of the incomplete standardization process and of political disputes of the big players WebRTC achieved the necessary impulse to become a new mass medium - 4 years after the project got started. It offers excellent customization possibilities which will result in lots of new use cases over the time. Missing big players like Microsoft seem to jump on the bandwagon which will drive the progress further. Even the blocking of the H.264 standard as alternative codec seems to get its first fissures. But even without that the potential is very high.

The number of WebRTC infrastructure and framework providers grows. Despite the still necessary adaptions to changes in the standardization process a big bunch of Web applications with integrated WebRTC functionality already exists. The risks of investment which one always has with the introduction of a new technology go down. Beyond the common hype and marketing euphoria phase the first added values of that technology in Web applications start to manifest.

As shown the potential application scenarios of WebRTC cover a broad range. The combination of media and other real-time data with classical Web applications and their workflows will create new potentials for customer retention, acquisition and sales processes. The old dream of savings and usage effects due to the replacement of physical presence by a remote live media presence is pushed by WebRTC to a great extent.

The right moment to get onboard with that new technology. Our expertise may help you with planning, workflow analyses, infrastructure installation and operation and software-integration.