– thoughts on CSS, UIs and UX.

Video is dumb

Posted: September 18th, 2012 Author:

The more I think about the aspects that make the web great, the more I wish they could be applied to video.
Video, in the traditional sense, is dumb. It’s a single-direction, force-fed slop of visual and auditory data that can only really be processed by humans.

Video is dumb

When I think about what we’re seeing and therefore visually processing, I think about our ability to identify and provide context to most artifacts we see. We can distinguish between one visual element and another. Our hearing processors can break apart certain sounds and place them in an approximate position in 3D space. And when things in our field of view move we have time and space in play.
Why can’t video be like that? How can we make video intelligible to machines without replicating human-like processors in machines? These questions have been bothering me for a while, so I finally decided to start jotting down some answers and observations.

Video as object markup

Imagine there were layers in a video, each layer could represent ‘objects’ in the video. These objects could be visual, such as the cat object moving towards a laser pointer object.
Yet more objects could be, for instance, the sounds in the video, the camera, the environment and location of the shot.
Imagine you could interrogate this video as you would an API.
As a user you could hover over the visual elements, target an object and gain more information about the target.

What about SVG as intelligent video?

When SVG was gaining early ground in the early 21st century, I was eagerly downloading Adobe SVG plugins for IE and testing out some of SVGs purported capabilities. One such capability was the modification of the SVG DOM via JavaScript. One could basically animate an SVG by manipulating it, creating complex fractals or the more mundane transforms that CSS3 allows these days such as this rolling football:
Animated SVG football
(If you can’t see the wikipedia SVG above, upgrade your browser already!)

Image formats and a little magic

Videos are basically the complex blending of still images. But what if we could break up a video into component layers and then animate a set of them? What if these layers could use blended transparency and be overlapped?

Recently I noticed a few posts and tweets about using image formats, JSON and a little JavaScript to make more ‘interactive’ playbacks. First Apple’s new iPhone 5 design page featured some nifty animation tricks, then John Skinner’s article about the sublimetext.com animations cropped up. It’s a step in the right direction too.

Youtube’s recent endeavours in video markup

Youtube has been making great moves towards more intelligent video – we’re seeing annotations, subtitles and simple interactivity but I feel like it’s not there, it’s still a one-way conversation where I’m forced to listen, at a set pace, and therefore not interact with or influence my experience.

What about user controls?

Simple play, pause and skip controls aren’t where we should stop in terms of innovation. Choosing a separate audio stream, playback speed, switching on markup should be all be standard options available to us.

Video is broken to machines


Maybe my hopes are a long way off, but I feel like we’re making progress. Maybe we’ll see Google start indexing videos, providing the lyrics to the songs you watch on Youtube or allowing you to see your search results for your favourite movie quote in the actual movie playback (at just the right marker).

Comments are closed.