The more I think about the aspects that make the web great, the more I wish they could be applied to video.
Video, in the traditional sense, is dumb. It’s a single-direction, force-fed slop of visual and auditory data that can only really be processed by humans.
When I think about what we’re seeing and therefore visually processing, I think about our ability to identify and provide context to most artifacts we see. We can distinguish between one visual element and another. Our hearing processors can break apart certain sounds and place them in an approximate position in 3D space. And when things in our field of view move we have time and space in play.
Why can’t video be like that? How can we make video intelligible to machines without replicating human-like processors in machines? These questions have been bothering me for a while, so I finally decided to start jotting down some answers and observations.
Video as object markup
Imagine there were layers in a video, each layer could represent ‘objects’ in the video. These objects could be visual, such as the cat object moving towards a laser pointer object.
Yet more objects could be, for instance, the sounds in the video, the camera, the environment and location of the shot.
Imagine you could interrogate this video as you would an API.
As a user you could hover over the visual elements, target an object and gain more information about the target.
What about SVG as intelligent video?
(If you can’t see the wikipedia SVG above, upgrade your browser already!)
Image formats and a little magic
Videos are basically the complex blending of still images. But what if we could break up a video into component layers and then animate a set of them? What if these layers could use blended transparency and be overlapped?
Youtube’s recent endeavours in video markup
Youtube has been making great moves towards more intelligent video – we’re seeing annotations, subtitles and simple interactivity but I feel like it’s not there, it’s still a one-way conversation where I’m forced to listen, at a set pace, and therefore not interact with or influence my experience.
What about user controls?
Simple play, pause and skip controls aren’t where we should stop in terms of innovation. Choosing a separate audio stream, playback speed, switching on markup should be all be standard options available to us.
Maybe my hopes are a long way off, but I feel like we’re making progress. Maybe we’ll see Google start indexing videos, providing the lyrics to the songs you watch on Youtube or allowing you to see your search results for your favourite movie quote in the actual movie playback (at just the right marker).