Deep learning methods
13 April 2018
The application of deep learning methods has made an extraordinary contribution to the fields of artificial intelligence and machine learning. In a mere five years, we've gone from near-unusable image recognition and speech transcription, to a point where we are providing superhuman results for these tasks. Many of these methods may also be used in other settings and it should be noted that the consequences of this sudden progress extend to almost every industry.
To appreciate the potential impact, it would be worthwhile to firstly classify what are referred to as deep learning methods. Deep learning is a specific sub-field of machine learning: a new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations. The ‘deep' in deep learning isn't a reference to any kind of deeper understanding that the methods seek to achieve; rather, it represents the idea that the model incorporates successive layers of representations. Hence, the amount of layers that are incorporated in a model refer to the depth of the model.
In a sense these models share common ideas to those of the hierarchical models that exist within the field of Bayesian statistics; however, modern deep learning often involves tens or even hundreds of successive layers of representations. In addition, each of the layers in the model seek to learn from its exposure to the data to identify the rules that are responsible for generating a particular outcome. Within the field of deep learning, these layered representations are (almost always) resilient on the application of models that utilise the neural network framework, which is a mathematical technique that was developed within neurobiology. It is important to note that deep-learning models are not models of the brain, as the word neural is just a reference to field in which this technique was initially developed.
The application of deep learning methods rose to prominence in the early 2010s and in the few years since, it has achieved nothing short of a revolution in the machine learning field, with remarkable results on perceptual problems such as seeing and hearing-problems involving skills that seem natural and intuitive to humans but have long been elusive for machines.
In particular, deep learning has achieved the following breakthroughs, all in historically difficult areas of machine learning: near-human-level image classification, near-human-level speech recognition, near-human-level handwriting transcription, improved machine translation, improved text-to-speech conversion, digital assistants such as Google Now and Amazon Alexa, near-human-level autonomous driving, improved ad targeting (as used by Google, Baidu, and Bing), improved search results on the web, ability to answer natural-language questions, and Superhuman Go playing. More recently we’ve started applying these techniques to a wide variety of problems outside of machine perception and natural-language understanding, such as formal reasoning. If successful, this may herald an age where deep learning assists humans in science, software development, and more.
Although deep learning methods have provided more accurate results in a number of applications, the procedures that are followed also make problem solving much easier, since it automates one of the most time consuming phases of the machine-learning workflow; which involves the initial transformation of the input data to make it more amenable to processing. In contrast, with deep learning, you learn all features in one pass rather than having to engineer them yourself. This has greatly simplified traditional workflows, often replacing sophisticated multi-stage pipelines with a single, simple, end-to-end deep-learning model.
You may ask, if the crux of the issue is to have multiple successive layers of representations, could traditional machine-learning methods be applied repeatedly to emulate the effects of deep learning? In practice, there are fast-diminishing returns to successive applications of machine-learning methods, because the optimal first representation layer in a three-layer model isn't the optimal first layer in a one-layer or two-layer model. What is particularly profound about deep learning is that it is metamorphic, in the sense that it allows for the model to learn from all layers of representation both jointly and concurrently, rather than in succession. With joint feature learning, whenever the model adjusts one of its internal features, all other features that depend on it automatically adapt to the change, without requiring human intervention. Everything is supervised by a single feedback signal: every change in the model serves the end goal. This is much more powerful than stacking machine-learning models, because it allows for complex, abstract representations to be learned by breaking them down into long series of intermediate spaces (layers); each space is only a simple transformation away from the previous one.
These are the two essential characteristics of how deep learning learns from data: the incremental, layer-by-layer way in which increasingly complex representations are developed, and the fact that these intermediate incremental representations are learned jointly, each layer being updated to follow both the representational needs of the layer above and the needs of the layer below. Together, these two properties have made deep learning vastly more successful than previous approaches to machine learning in a number of important applications. However, it is not the case that these methods will be superior in every application, and as such the data-scientist would need to know when to apply each of the different model structures. This is something that would usually be acquired through experience, as there have been a number of cases where deep learning methods have provided unsatisfactory results (where the models largely describe the noise around a particular outcome).
Hence, we should acknowledge that deep learning methods have only been in the spotlight for a few years, and we haven't yet established the full scope of what they can do. With every passing month, we learn about new use cases and engineering improvements that lift previous limitations. In the wake of a scientific revolution, progress generally follows a sigmoid curve: it starts with a period of fast progress, which gradually stabilises as researchers hit hard limitations, and then further improvements become incremental. Deep learning in 2018 seems to be in the first half of that sigmoid, with much more progress to come in the next few years.