There are many methods to classify time series using neural networks. This blog post will mainly focus on two-dimensional CNNs and how 1D series can be represented as images.
Let be observations of some sensor (gyroscope, goniometer etc.). In cartesian coordinates, a point is given by an index and a value .
The tuple can be transformed to polar coordinates by setting and for and for . Then gives back the original coordinates.
Now, let us assume that . Then all points lie on the unit circle. This requires and to be in the intervals or . The advantage of is that the functions and are injective.
Since neural networks don’t care in general whether one scales to or , we choose the latter. The transformation is .
Polar coordinates make it possible to use trigonometric identities like and .
When we keep the assumption , the trigonometric functions are simply and . However, we lost the time because we set . Since is just an index, this won’t hurt the performance of the CNN.
(quasi)-Gramian Angular Field (GAF) and Recurrence Plot (RP)
GAFs and RPs are both 2D plots that show the behavior of some time step to another time step. For example, we could look at how much higher the value is at time in comparison to time .
The paper  introduced the following two GAFs: and for all in the time series. These two plots use polar coordinates the way it was described in the last section.
In comparison, RPs use cartesian coordinates. For a vector of dimension , the definition is for all .
Let us look at a simple example: a time series consisting of two values and . There are 4 possibilities: , , and .
The recurrence plot is given by:
To calculate the GAFs, polar coordinates are needed. Scaling and to $[0, 1$] results in and . Then by the trigonometric formulas from above, each entry of the first GAF is given by .
The first GAF is:
The second GAF can be calculated similarly.
These matrices can be fed to the neural network as input. In general, we can even use our own operations like or . It’s also possible to combine different time series like or .
Time series can become fairly long. For example, a time series containing measurements would result in a GAF or RP plot. Hence, we will first reduce the size with a piecewise aggregate approximation (PAA).
This function takes as input an matrix where is the length of one time series and is the number of time series. Next, we apply PAA to the data and calculate the plots.
Since GAFs return values from to , the RP plots have to be scaled to the same range. The outer product is a slow operation, so I would recommend calculating only once the images and storing them in memory.
The next step is to define a model for the neural network. I got the best results with a Wide Residual Network . I set the network width to and the depth to .
During training, it is important to use some kind of data augmentation, because residual networks tend to overfit. The following code randomly adds Gaussian distributed noise to the whole input matrix.
I tested the 2D CNN model on an activity recognition dataset with 10-fold cross validation. There were in total features (time series) which were transformed to RP/GAF plots of size .
The 2D CNN model performed consistently better than MLP and at least as well as 1D CNN and 1D LSTM-CNN. More tests are of course needed but for specific datasets the performance is quite good.
To conclude this blog post, here are some input images (using some threshold).
Random noise with :
Gyroscope z-axis with :
 Zhiguang Wang and Tim Oates. “Imaging Time-Series to Improve Classification and Imputation”. https://arxiv.org/abs/1506.00327
 S. Zagoruyko and N. Komodakis. “Wide Residual Networks”. https://arxiv.org/abs/1605.07146
 J. Debayle, N. Hatami, and Y. Gavet. “Classification of Time-Series Images Using Deep Convolutional Neural Networks”. https://arxiv.org/abs/1710.00886