For the last couple of weeks , I have been working on the concept of rendering 3D graphics on WebRTC media stream using different JavaScript libraries as part of a Virtual Reality project .
- AR vs VR
- WebRTC
- WebGL
- Three.js
- WASM/OpenGL
What is Augmented Reality ?
Augmented reality (AR) is viewing a real-world environment with elements that are supplemented by computer-generated sensory inputs such as sound, video, graphics , location etc.
How is AR diff. from VR(Virtual Reality) ?
Virtual Reality | Augmented Reality |
replaces the real world with simulated one , user is isolated from real life , Examples – Oculus Rift & Kinect | blending of virtual reality and real life , user interacts with real world through digital overlays , Examples – Google glass & Holo Lens |
Methods for rendering augmented Reality
- Computer Vision
- Object Recognition
- Eye Tracking
- Face Detection and substitution
- Emotion and gesture picker
- Edge Detection
Web based Augmented Reality platform building has for a Web base Components for end-to-end AR solution such as WebRTC getusermedia , Web Speech API, css, svg, HTML5 canvas, sensor API. Hardware components that can include Graphics driver, media capture devices such as microphone and camera, sensors. 3D Components like Geometry and Math Utilities, 3D Model Loaders and models, Lights, Materials,Shaders, Particles, Animation.
WebRTC (Web based Real Time communications)
Browser’s media stream and data. Standardization , on a API level at the W3C and at the protocol level at the IETF. WebRTc enables browser to browser applications for voice calling, video chat and P2P file sharing without plugins.Enables web browsers with Real-Time Communications (RTC) capabilities.
Code snippet for WebRTC API
1.To begin with WebRTC we first need to validate that the browser has permission to access the webcam. Find out if the user’s browser can use the getUserMedia API.
function hasGetUserMedia() { return !!(navigator.webkitGetUserMedia); }
- Get the stream from the user’s webcam.
var video = $('#webcam')[0]; if (navigator.webkitGetUserMedia) { navigator.webkitGetUserMedia( {audio:true, video:true}, function(stream) { video.src = window.webkitURL.createObjectURL(stream); }, function(e) { alert('Webcam error!', e); } ); }
Screenshot AppRTC

End to End RTC Pipeline for AR

WebGL
- Web Graphics Library
- JavaScript API for rendering interactive 2D and 3D computer graphics in browser
- no plugins
- uses GPU ( Graphics Processing Unit ) acceleration
- can mix with other HTML elements
- uses the HTML5 canvas element and is accessed using Document Object Model interfaces
- cross platform , works on all major Desktop and mobile browsers
WebGL Development
To get started you should know about :
- GLSL, the shading language used by OpenGL and WebGL
- Matrix computation to set up transformations
- Vertex buffers to hold data about vertex positions, normals, colors, and textures
- matrix math to animate shapes
Cleary WebGL is bit tough given the amount of careful coding , mapping and shading it requires .
Proceeding to some JS libraries that can make 3D easy for us .
CCV
- website : http://libccv.org/
- SourceCode : https://github.com/liuliu/ccv
Awe.js
- Website : https://buildar.com/awe/tutorials/intro_to_awe.js/index.html#
- SourceCode : https://github.com/buildar/awe.js
ArcuCO
- SourceCode: https://github.com/jcmellado/js-aruco
Potree
- website: http://potree.org/wp/
- SourceCode: https://github.com/potree
- Demo: http://potree.org/wp/demo/
Karenpeng
emotion & gesture-based arpeggiator and synthesizer
- SourceCode : https://github.com/karenpeng/motionEmotion
- Demo: http://motionemotion.herokuapp.com/
Three.JS
MIT license javascript 3D engine ie ( WebGL + more).
- website : http://threejs.org/
- SourceCode : https://github.com/mrdoob/three.js/
- Demo: http://www.davidscottlyons.com/threejs/
3D space with webcam input as texture
Display the video as a plane which can be viewed from various angles in a given background landscape. Credits for below code : https://stemkoski.github.io/Three.js/
1.Use code from slide 10 to get user’s webcam input through getUserMedia
- Make a Screen , camera and renderer as previously described
- Give orbital CONTROLS for viewing the media plane from all angles
controls = new THREE.OrbitControls( camera, renderer.domElement );
Make the FLOOR with an image texture
[sourcecode language="html"]
var floorTexture = new THREE.ImageUtils.loadTexture( 'imageURL.jpg' );
floorTexture.wrapS = floorTexture.wrapT = THREE.RepeatWrapping;
floorTexture.repeat.set( 10, 10 );
var floorMaterial = new THREE.MeshBasicMaterial({map: floorTexture, side: THREE.DoubleSide});
var floorGeometry = new THREE.PlaneGeometry(1000, 1000, 10, 10);
var floor = new THREE.Mesh(floorGeometry, floorMaterial);
floor.position.y = -0.5;
floor.rotation.x = Math.PI / 2;
scene.add(floor)
[/sourcecode]
6. Add Fog
scene.fog = new THREE.FogExp2( 0x9999ff, 0.00025 );
7.Add video Image Context and Texture.
video = document.getElementById( 'monitor' ); videoImage = document.getElementById( 'videoImage' ); videoImageContext = videoImage.getContext( '2d' ); videoImageContext.fillStyle = '#000000'; videoImageContext.fillRect( 0, 0, videoImage.width, videoImage.height ); videoTexture = new THREE.Texture( videoImage ); videoTexture.minFilter = THREE.LinearFilter; videoTexture.magFilter = THREE.LinearFilter; var movieMaterial=new THREE.MeshBasicMaterial({map:videoTexture,overdraw:true,side:THREE.DoubleSide}); var movieGeometry = new THREE.PlaneGeometry( 100, 100, 1, 1 ); var movieScreen = new THREE.Mesh( movieGeometry, movieMaterial ); movieScreen.position.set(0,50,0); scene.add(movieScreen);
- Set camera position
camera.position.set(0,150,300); camera.lookAt(movieScreen.position);
- Define the render function
videoImageContext.drawImage( video, 0, 0, videoImage.width, videoImage.height ); renderer.render( scene, camera );
- Animation
requestAnimationFrame( animate ); render();

WASM/OpenGL
WASM ( Web Assembly) is portable binary-code format and a corresponding text format. It is used for facilitating interactions between c++ programs and their host environment such as Javascript code into the browsers.
Emscripten is a one of the compiler toolchain and is a way to compile C++ into WASM.
GPL support for AR
Compiler/CUDA/OpenGL/Vulcan/Graphics/Fortran/GPGPU Developer.
Web media APIs like MSE (Media Source Extensions) and EME (Encrypted Media Extensions)
AR Processing pipeline

credits : Media Pipe Google AI
On deveice Machine Learning piipeline consists of a platform solution such as MediaPipe above along with WASM. The WASM SIMD (Single instruction, multiple data for parallel process ) ML inerface can be XNNPACK or any other mobile platform based neural network inference framework. This is followerd by rendering.
GPU Accelerated segmentation ( WebGL) outperforms CPU Segmentation using WASM SIMD by talking the latency down from ~8.7ms to ~4.3 ms. Novel WebGL interface can via optimized fragment shaders using MRT.

Credits Intel Video analytics pipeline p7
GStreamer Video Analytics
Ref :
Your code for creating a webcam view in 3D space looks really familiar. Are you just copying Stemkoski’s three.js code here? If so, you should give him credit.
sure thanks for pointing that out , have added the credit . I do try to add the original source and reference but sometimes there is a miss . fixed it
I saw slideshare presentation and looked this up . Good stuff !! keep it up