AR/VR on WebRTC WebGL , Three.js and WebRTC

For the last couple of weeks , I have been working on the concept of rendering 3D graphics on WebRTC media stream using different JavaScript libraries as part of a Virtual Reality project .

What is Augmented Reality ?

Augmented reality (AR) is viewing a real-world environment with elements that are supplemented by computer-generated sensory inputs such as sound, video, graphics , location etc.

How is AR diff. from VR(Virtual Reality) ?

Virtual RealityAugmented Reality
replaces the real world with simulated one , user is isolated from real life , Examples – Oculus Rift & Kinectblending of virtual reality and real life , user interacts with real world through digital overlays , Examples – Google glass & Holo Lens

Methods for rendering augmented Reality

  • Computer Vision
  • Object Recognition
  • Eye Tracking
  • Face Detection and substitution
  • Emotion and gesture picker
  • Edge Detection

Web based Augmented Reality platform building has for a Web base Components for end-to-end AR solution such as WebRTC getusermedia , Web Speech API, css, svg, HTML5 canvas, sensor API. Hardware components that can include Graphics driver, media capture devices such as microphone and camera, sensors. 3D Components like Geometry and Math Utilities, 3D Model Loaders and models, Lights, Materials,Shaders, Particles, Animation.

WebRTC (Web based Real Time communications)

Browser’s media stream and data. Standardization , on a API level at the W3C and at the protocol level at the IETF. WebRTc enables browser to browser applications for voice calling, video chat and P2P file sharing without plugins.Enables web browsers with Real-Time Communications (RTC) capabilities.

Code snippet for WebRTC API

1.To begin with WebRTC we first need to validate that the browser has permission to access the webcam. Find out if the user’s browser can use the getUserMedia API.

function hasGetUserMedia() {
	return !!(navigator.webkitGetUserMedia);
  1. Get the stream from the user’s webcam.
var video = $('#webcam')[0];
if (navigator.webkitGetUserMedia) {
			{audio:true, video:true},
			function(stream) { video.src = window.webkitURL.createObjectURL(stream);  },
			function(e) {	alert('Webcam error!', e); }

Screenshot AppRTC

Augmented Reality in WebRTC Browser - Google Slides

End to End RTC Pipeline for AR


  • Web Graphics Library
  • JavaScript API for rendering interactive 2D and 3D computer graphics in browser
  • no plugins
  • uses GPU ( Graphics Processing Unit ) acceleration
  • can mix with other HTML elements
  • uses the HTML5 canvas element and is accessed using Document Object Model interfaces
  • cross platform , works on all major Desktop and mobile browsers

WebGL Development

To get started you should know about :

  • GLSL, the shading language used by OpenGL and WebGL
  • Matrix computation to set up transformations
  • Vertex buffers to hold data about vertex positions, normals, colors, and textures
  • matrix math to animate shapes

Cleary WebGL is bit tough given the amount of careful coding , mapping and shading it requires .

Proceeding to some JS libraries that can make 3D easy for us .






emotion & gesture-based arpeggiator and synthesizer


MIT license javascript 3D engine ie ( WebGL + more).

3D space with webcam input as texture

Display the video as a plane which can be viewed from various angles in a given background landscape. Credits for below code :

1.Use code from slide 10 to get user’s webcam input through getUserMedia

  1. Make a Screen , camera and renderer as previously described

  2. Give orbital CONTROLS for viewing the media plane from all angles
	controls = new THREE.OrbitControls( camera, renderer.domElement );

Make the FLOOR with an image texture

[sourcecode language="html"]
	var floorTexture = new THREE.ImageUtils.loadTexture( 'imageURL.jpg' );
	floorTexture.wrapS = floorTexture.wrapT = THREE.RepeatWrapping;
	floorTexture.repeat.set( 10, 10 );
	var floorMaterial = new THREE.MeshBasicMaterial({map: floorTexture, side: THREE.DoubleSide});
	var floorGeometry = new THREE.PlaneGeometry(1000, 1000, 10, 10);
	var floor = new THREE.Mesh(floorGeometry, floorMaterial);
	floor.position.y = -0.5;
	floor.rotation.x = Math.PI / 2;

6. Add Fog

scene.fog = new THREE.FogExp2( 0x9999ff, 0.00025 );

7.Add video Image Context and Texture.

video = document.getElementById( 'monitor' );
videoImage = document.getElementById( 'videoImage' );
videoImageContext = videoImage.getContext( '2d' );
videoImageContext.fillStyle = '#000000';
videoImageContext.fillRect( 0, 0, videoImage.width, videoImage.height );
videoTexture = new THREE.Texture( videoImage );
videoTexture.minFilter = THREE.LinearFilter;
videoTexture.magFilter = THREE.LinearFilter;
var movieMaterial=new THREE.MeshBasicMaterial({map:videoTexture,overdraw:true,side:THREE.DoubleSide});
var movieGeometry = new THREE.PlaneGeometry( 100, 100, 1, 1 );
var movieScreen = new THREE.Mesh( movieGeometry, movieMaterial );
  1. Set camera position
  1. Define the render function
    videoImageContext.drawImage( video, 0, 0, videoImage.width, videoImage.height );
    renderer.render( scene, camera );
  1. Animation
   requestAnimationFrame( animate );
Augmented Reality in WebRTC Browser - Google Slides (4)


WASM ( Web Assembly) is portable binary-code format and a corresponding text format. It is used for facilitating interactions between c++ programs and their host environment such as Javascript code into the browsers.

Emscripten is a one of the compiler toolchain and is a way to compile C++ into WASM.

GPL support for AR

Compiler/CUDA/OpenGL/Vulcan/Graphics/Fortran/GPGPU Developer.

Web media APIs like MSE (Media Source Extensions) and EME (Encrypted Media Extensions)

AR Processing pipeline 

credits : Media Pipe Google AI

On deveice Machine Learning piipeline consists of a platform solution such as MediaPipe above along with WASM. The WASM SIMD (Single instruction, multiple data for parallel process ) ML inerface can be XNNPACK or any other mobile platform based neural network inference framework. This is followerd by rendering.

GPU Accelerated segmentation ( WebGL) outperforms CPU Segmentation using WASM SIMD by talking the latency down from ~8.7ms to ~4.3 ms. Novel WebGL interface can via optimized fragment shaders using MRT.

Credits Intel Video analytics pipeline p7

GStreamer Video Analytics

Ref :


1. Spinning Colored Cube

Step 1 : Get three.js from :

Step 2 : Make a empty HTML5 page and import the script + basic styling of page

		<title>Spinning colored Cube</title>
			body { margin: 0; }
			canvas { width: 100%; height: 100% }
		<script src="js/three.min.js"></script>
		<script>// Our Javascript will go here. </script>

Step 3 : Scene

var scene = new THREE.Scene();	

Step 4 : Camera
Camera types in three.js are CubeCamera , OrthographicCamera, PerspectiveCamera. We are using Perspective camera here . Attributes are field of view , aspect ratio , near and far clipping plane.

var camera = new THREE.PerspectiveCamera( 75, window.innerWidth / window.innerHeight, 0.1, 1000 );

Step 5: Renderer
Renderer uses a <canvas> element to display the scene to us.

var renderer = new THREE.WebGLRenderer();
renderer.setSize( window.innerWidth, window.innerHeight );
document.body.appendChild( renderer.domElement );

Step 6: . BoxGeometry object contains all the points (vertices) and fill (faces) of the cube.

var geometry = new THREE.BoxGeometry( 1, 1, 1 );

Step 7: Material
threejs has materials like – LineBasicMaterial , MeshBasicMaterial , MeshPhongMaterial , MeshLambertMaterial
These have their properties like -id, name, color , opacity , transparent etc. Use MeshBasicMaterial and color attribute of 0x00ff00, which is green.

	var material = new THREE.MeshBasicMaterial( { color: 0x00ff00 } );

Step 8: Mesh
A mesh is an object that takes a geometry, and applies a material to it, which we then can insert to our scene, and move freely around.

var cube = new THREE.Mesh( geometry, material );

Step 9: By default, when we call scene.add(), the thing we add will be added to the coordinates (0,0,0). This would cause both the camera and the cube to be inside each other. To avoid this, we simply move the camera out a bit.

scene.add( cube );
	camera.position.z = 5;

Step 10: Create a loop to render something on the screen

function render() {
	requestAnimationFrame( render );
	renderer.render( scene, camera );

This will create a loop that causes the renderer to draw the scene 60 times per second.
Step 11 : Animating the cube
This will be run every frame (60 times per second), and give the cube a nice rotation animation

cube.rotation.x += 0.1;
cube.rotation.y += 0.1;

Augmented Reality in WebRTC Browser - Google Slides (1)

2. Shaded Material on Sphere

Stepp 1 : create a empty page and import three.min.js and jquery

		<title>Shaded Material on Sphere </title>
			body { margin: 0; }
			canvas { width: 100%; height: 100% }
		<script src="js/jquery.min.js"></script>
<script src="js/three.min.js"></script>
	<script>// Our Javascript will go here.</script>
	<div id="container"></div>

Step 2 : Repeat the same steps at in previous example

var scene = new THREE.Scene();
var camera =  new THREE.PerspectiveCamera(45, 600/600 , 0.1, 10000);
var renderer = new THREE.WebGLRenderer();
renderer.setSize(600 , 600 );	
camera.position.z = 300;	// the camera starts at 0,0,0 so pull it back

3. Create the sphere’s material as MeshLambertMaterial
MeshLambertMaterial is non-shiny (Lambertian) surfaces, evaluated per vertex. Set the color to red .

var sphereMaterial =  new THREE.MeshLambertMaterial(  { color: 0xCC0000  });

4. create a new mesh with sphere geometry ( radius, segments, rings) and add to scene

var sphere = new THREE.Mesh(  new THREE.SphereGeometry(  50, 16, 16 ),  sphereMaterial);

5. Light
Create light , set its position and add it to scene as well . Light can be point light , spot light , directional light .

var pointLight = new THREE.PointLight(0xFFFFFF);
pointLight.position.x = 10;
pointLight.position.y = 50;
pointLight.position.z = 130;

6. Render the whole thing

renderer.render(scene, camera);

Augmented Reality in WebRTC Browser - Google Slides (2)

3. Complex objects like Torusknot

Step 1 : Same as before make scene , camera and renderer

scene = new THREE.Scene();
camera = new THREE.PerspectiveCamera(125, window.innerWidth / window.innerHeight, 1, 500);
camera.position.set(0, 0, 100);
camera.lookAt(new THREE.Vector3(0, 0, 0));
var renderer = new THREE.WebGLRenderer(); 
renderer.setSize( window.innerWidth, window.innerHeight );
document.body.appendChild( renderer.domElement );

Step 2 : Add the lighting

var light = new THREE.PointLight(0xffffff);
light.position.set(0, 250, 0);
var ambientLight = new THREE.AmbientLight(0x111111);

Step 3 : Add Torusknotgeometry with radius, tube, radialSegments, tubularSegments, arc

var geometry = new THREE.TorusKnotGeometry( 8, 2, 100, 16, 4, 3 ); 
var material = new THREE.MeshLambertMaterial( { color: 0x2022ff } );
var torusKnot = new THREE.Mesh( geometry, material ); 
torusKnot.position.set(3, 3, 3);
scene.add( torusKnot );
camera.position.z =25;

Step 4 : Do the animation and render on screen

var render = function () { 
    requestAnimationFrame( render ); 
    torusKnot.rotation.x += 0.01;    
    torusKnot.rotation.y += 0.01; 
    renderer.render(scene, camera);

Augmented Reality in WebRTC Browser - Google Slides (3)