AR/VR on WebRTC WebGL , Three.js and WebRTC

For the last couple of weeks , I have been working on the concept of rendering 3D graphics on WebRTC media stream using different JavaScript libraries as part of a Virtual Reality project .


What is Augmented Reality ?

Augmented reality (AR) is viewing a real-world environment with elements that are supplemented by computer-generated sensory inputs such as sound, video, graphics , location etc.

How is AR diff. from VR(Virtual Reality) ?

Virtual RealityAugmented Reality
replaces the real world with simulated one , user is isolated from real life , Examples – Oculus Rift & Kinectblending of virtual reality and real life , user interacts with real world through digital overlays , Examples – Google glass & Holo Lens

Methods for rendering augmented Reality

  • Computer Vision
  • Object Recognition
  • Eye Tracking
  • Face Detection and substitution
  • Emotion and gesture picker
  • Edge Detection

Web based Augmented Reality platform building has for a Web base Components for end-to-end AR solution such as WebRTC getusermedia , Web Speech API, css, svg, HTML5 canvas, sensor API. Hardware components that can include Graphics driver, media capture devices such as microphone and camera, sensors. 3D Components like Geometry and Math Utilities, 3D Model Loaders and models, Lights, Materials,Shaders, Particles, Animation.

WebRTC (Web based Real Time communications)

Browser’s media stream and data. Standardization , on a API level at the W3C and at the protocol level at the IETF. WebRTc enables browser to browser applications for voice calling, video chat and P2P file sharing without plugins.Enables web browsers with Real-Time Communications (RTC) capabilities.

Code snippet for WebRTC API

1.To begin with WebRTC we first need to validate that the browser has permission to access the webcam. Find out if the user’s browser can use the getUserMedia API.

function hasGetUserMedia() {
	return !!(navigator.webkitGetUserMedia);
}
  1. Get the stream from the user’s webcam.
var video = $('#webcam')[0];
if (navigator.webkitGetUserMedia) {
         navigator.webkitGetUserMedia(
			{audio:true, video:true},
			function(stream) { video.src = window.webkitURL.createObjectURL(stream);  },
			function(e) {	alert('Webcam error!', e); }
		);
}

Screenshot AppRTC

https://apprtc.appspot.com

Augmented Reality in WebRTC Browser - Google Slides

End to End RTC Pipeline for AR

WebGL

  • Web Graphics Library
  • JavaScript API for rendering interactive 2D and 3D computer graphics in browser
  • no plugins
  • uses GPU ( Graphics Processing Unit ) acceleration
  • can mix with other HTML elements
  • uses the HTML5 canvas element and is accessed using Document Object Model interfaces
  • cross platform , works on all major Desktop and mobile browsers

WebGL Development

To get started you should know about :

  • GLSL, the shading language used by OpenGL and WebGL
  • Matrix computation to set up transformations
  • Vertex buffers to hold data about vertex positions, normals, colors, and textures
  • matrix math to animate shapes

Cleary WebGL is bit tough given the amount of careful coding , mapping and shading it requires .

Proceeding to some JS libraries that can make 3D easy for us .

CCV

Awe.js

ArcuCO

Potree

Karenpeng

emotion & gesture-based arpeggiator and synthesizer

Three.JS

MIT license javascript 3D engine ie ( WebGL + more).

3D space with webcam input as texture

Display the video as a plane which can be viewed from various angles in a given background landscape. Credits for below code : https://stemkoski.github.io/Three.js/

1.Use code from slide 10 to get user’s webcam input through getUserMedia

  1. Make a Screen , camera and renderer as previously described

  2. Give orbital CONTROLS for viewing the media plane from all angles
	controls = new THREE.OrbitControls( camera, renderer.domElement );
 

Make the FLOOR with an image texture

[sourcecode language="html"]
	var floorTexture = new THREE.ImageUtils.loadTexture( 'imageURL.jpg' );
	floorTexture.wrapS = floorTexture.wrapT = THREE.RepeatWrapping;
	floorTexture.repeat.set( 10, 10 );
	var floorMaterial = new THREE.MeshBasicMaterial({map: floorTexture, side: THREE.DoubleSide});
	var floorGeometry = new THREE.PlaneGeometry(1000, 1000, 10, 10);
	var floor = new THREE.Mesh(floorGeometry, floorMaterial);
	floor.position.y = -0.5;
	floor.rotation.x = Math.PI / 2;
	scene.add(floor)
[/sourcecode]


6. Add Fog


scene.fog = new THREE.FogExp2( 0x9999ff, 0.00025 );

7.Add video Image Context and Texture.

video = document.getElementById( 'monitor' );
videoImage = document.getElementById( 'videoImage' );
videoImageContext = videoImage.getContext( '2d' );
videoImageContext.fillStyle = '#000000';
videoImageContext.fillRect( 0, 0, videoImage.width, videoImage.height );
videoTexture = new THREE.Texture( videoImage );
videoTexture.minFilter = THREE.LinearFilter;
videoTexture.magFilter = THREE.LinearFilter;
var movieMaterial=new THREE.MeshBasicMaterial({map:videoTexture,overdraw:true,side:THREE.DoubleSide});
var movieGeometry = new THREE.PlaneGeometry( 100, 100, 1, 1 );
var movieScreen = new THREE.Mesh( movieGeometry, movieMaterial );
movieScreen.position.set(0,50,0);
scene.add(movieScreen);
  1. Set camera position
	camera.position.set(0,150,300);
	camera.lookAt(movieScreen.position);
  1. Define the render function
    videoImageContext.drawImage( video, 0, 0, videoImage.width, videoImage.height );
    renderer.render( scene, camera );
  1. Animation
   requestAnimationFrame( animate );
   render();
Augmented Reality in WebRTC Browser - Google Slides (4)

WASM/OpenGL

WASM ( Web Assembly) is portable binary-code format and a corresponding text format. It is used for facilitating interactions between c++ programs and their host environment such as Javascript code into the browsers.

Emscripten is a one of the compiler toolchain and is a way to compile C++ into WASM.

GPL support for AR

Compiler/CUDA/OpenGL/Vulcan/Graphics/Fortran/GPGPU Developer.

Web media APIs like MSE (Media Source Extensions) and EME (Encrypted Media Extensions)

AR Processing pipeline 

credits : Media Pipe Google AI

On deveice Machine Learning piipeline consists of a platform solution such as MediaPipe above along with WASM. The WASM SIMD (Single instruction, multiple data for parallel process ) ML inerface can be XNNPACK or any other mobile platform based neural network inference framework. This is followerd by rendering.

GPU Accelerated segmentation ( WebGL) outperforms CPU Segmentation using WASM SIMD by talking the latency down from ~8.7ms to ~4.3 ms. Novel WebGL interface can via optimized fragment shaders using MRT.

Credits Intel Video analytics pipeline p7

GStreamer Video Analytics


Ref :

3 thoughts on “AR/VR on WebRTC WebGL , Three.js and WebRTC

  1. Your code for creating a webcam view in 3D space looks really familiar. Are you just copying Stemkoski’s three.js code here? If so, you should give him credit.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.