Researchers are now beginning to move beyond using photographs to teach Artificial Intelligence (A.I.) about the world around them, opening an array of new possible uses for the technology and possibly signaling a step forward toward truly aware A.I. There are, as always, challenges to be met and overcome. A.I.-driven systems that are camera enabled are not necessarily going to suddenly become much better overnight. Up until this point, machine learning systems that have been tested and programmed using machine vision – an A.I. linked to a camera – has been centered around the analysis of and the tying of results to image or text databases. That has, of course, proven useful and it could be argued that without that groundwork, analyzing a video would be next to impossible. However, being able to recognize video content or streaming content as it is in motion could represent a huge step forward for A.I.
More specifically, the machine learning algorithms that exist now are still primarily focused on analyzing individual frames and looking for recognized objects and then using other hardware or algorithms to respond is specific, pre-programmed ways. Even where videos are concerned, they aren’t analyzing what is happening in those videos. That includes the vast majority of video systems associated with self-driving cars, which is an industry that would certainly benefit from A.I. designed to analyze video footage. While those systems can recognize objects through the use of single frames, the rest of the computing for what is actually going on in the world around the vehicle is performed using data taken from other instruments such as LiDAR. A similar set of sensors and tools is used to assist other A.I.-driven systems, with machine vision acting as an augment to other tools and sensors.
The ability for machine learning to include contextual relevance would lend to systems capabilities in recognizing the more abstract idea of what is “happening” with a person or object it is looking at instead of just what that person or object “is.” It should be said that video-based machine learning is still very much in its infancy and various organizations and companies are currently using very different methods to fill out data-sets to be referenced by A.I. To expand on that, MIT and IBM are engaged in the creation of an index they call The Moments in Time Dataset, which centers around three-second long videos that include an array of activities. Google’s efforts focus on better recognition of objects, text, and audio from videos, utilizing its Cloud Platform and its YouTube-8M Dataset project. Facebook, meanwhile, is focusing on object recognition, scenarios, and actions with its Scenes, Actions, and Objects dataset – using annotations to provide contextual meaning. Finally, a startup based out of both Toronto and Berlin called Twenty Billion Neurons is taking a real-world approach and crowdsourcing the creation of its database, with paid workers performing simple tasks which are added to its database.
The different approaches are understandable since each will have a very different use. Google, for example, might use its dataset to improve any number of products falling under its umbrella, while Facebook is primarily focused on advertising and social networking. MIT and IBM, in the meantime, are more likely interested in discovering the possibilities of using video datasets and the problems inherent in storage, inconsistencies, and maintenance. That should mean vast improvements to self-driving vehicles, smart cameras, biometrics, digital assistants, social enforcement policies – such as Google’s A.I.-driven YouTube policy enforcement – and much more.
With consideration for the relatively open nature of those involved, the implications are likely to go much further than simple improvements to current systems, whether updates or new product iterations. It could, in fact, lead to a veritable wealth of new ideas and innovations that simply don’t exist today. In fact, it should also mean new, previously unthought of and perhaps currently unimaginable products and services. Tied in with new advances in hardware, the advent of 5G and other IoT-centric networks, and software optimizations, the ability to really understand the surrounding world in real-time could actually bring A.I. much closer to human cognition. It is important to recognize that isn’t likely to happen for quite some time, however, as neural networks are still limited by the human understanding of consciousness and the hardware constraints associated with human-built systems.