It has been hard to miss the hype surrounding the most recent wave of developments in machine learning and artificial intelligence (AI); indeed, look no further than the media coverage of DeepMind, the Alphabet-owned AI company famous for its AlphaGo program, which beat world number one Go player Ke Jie in 2017. As previously reported, a number of DeepMind’s patent applications were recently published; this article considers one of the products underlying some of those applications and how DeepMind is trying to protect it. In addition, the article looks at the difficulties that patent applications for Al-based inventions are likely to face at the EPO and how these may be overcome.
Two of the published patent applications (WO2018/048934 and WO2018/048945) relate to WaveNet, the technology behind Google Assistant’s realistic-sounding voice. At its heart, a WaveNet uses a form of artificial neural network – a convolutional neural network – to generate audio data. Convolutional neural networks are particularly adept at tasks that have a complex input and a relatively simple output, such as image recognition. A convolutional neural network can efficiently process large images and other sizable inputs by reducing the complexity of the input as part of its internal processing. Clusters of neurons process data from limited receptive fields of the input and pooling layers combine the outputs of several clusters into a single input for the next layer of the network.
However, in contrast to a conventional convolutional neural network, a WaveNet does not use pooling layers. Instead, it uses dilated causal convolution – “dilated” in the sense that the outputs of some of the hidden layer nodes are omitted from the calculation at each time step and “causal” in the sense that the output depends only on values in the past. This enables the WaveNet’s receptive field to be particularly large, covering thousands of previously generated audio samples, which is important when generating 16,000 samples for every second of audio output.
In use, text is input to the WaveNet as a sequence of linguistic and phonetic features containing information about, among other things, phonemes, syllables and words in the text. The WaveNet processes this input, along with previously generated samples, and outputs a corresponding audio sample for the given time step. This step-by-step generation of the audio samples is computationally expensive but essential for generating complex, realistic-sounding audio.
The two WaveNet applications include:
- claims for a WaveNet for generating audio; and
- a more general application of a WaveNet for generating arbitrary data sequences.
Considering the 934 publication in particular, the claims define a neural network including many of the key features described above, such as:
- a convolutional sub-network comprising one or more audio-processing convolutional neural network layers;
- an input that includes the audio sample from the preceding time step; and
- an output that defines a score distribution over a plurality of possible audio samples for the given time step.
Of all the requirements for a patent to be granted by the EPO, two stand out when patenting this kind of computer-implemented invention:
- excluded subject matter; and
- inventive step.
In Europe, computer programs and mathematical methods per se are excluded from patentability. Mixed-type inventions, which include features that fall within these categories and other features that do not (eg, a computer itself), are permitted; however, only the technical features (eg, those features that do not fall within one of the categories of excluded subject matter) are taken into consideration in the assessment of inventive step.
An algorithm such as a neural network – which is implemented as a computer program – is generally not considered to be a technical feature per se, but when the algorithm is applied to a technical problem it can take on technical character and consequently be taken into account in the assessment of inventive step.
Thus, the key question that the WaveNet applications will face, should they come before the EPO, will be whether an algorithm for generating a data sequence has technical character. This will likely come down to the specific application of the algorithm and whether the algorithm solves a technical problem. The problem of generating realistic-sounding audio in a text-to-speech system is likely to be considered technical; however, it remains to be seen whether the general application of a WaveNet to non-specific sequences of data can overcome the inventive-step hurdle in Europe without being tied to a specific application. Indeed, in the upcoming update to its Guidelines for Examination, the EPO clearly states that AI and machine-learning algorithms should serve a technical purpose in order to be patentable (see “EPO updates its guidelines with a section dedicated to AI”).
Once these hurdles have been overcome, the applications may find success at the EPO. In the words of the inventors: “The fact that directly generating time step per time step with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next” – as are we.