Unsupervised Time Series Classification for Climate Data

Outstanding success of Convolutional Neural Network image classification in the last few years influenced application of this technique to a variety of embeddable entities. CNN image classification methods are getting high accuracies but most of them are based on supervised machine learning that requires labeling of input data and do not interpret unknown data. In this study we introduce unsupervised machine learning model that maps embeddable entity pairs to symmetric or asymmetric images by transforming pairwise vectors to Gramian Angular Fields (GAF) images. Based on CNN transfer learning image classification model we indicate metrics of GAF images visual similarity to symmetric or asymmetric classes. Experimenting with climate data, we illustrate several scenarios that show that this model is reliable for entity pairs with one-way relationships but insufficient for entity pairs with two-way relationships.


Introduction
In the last few years deep learning demonstrated great success outperforming previous state-of-theart machine learning techniques in various domains [12]. In particular, after the evolutionary model AlexNet was created in 2012, deep learning techniques became very powerful in Convolutional Neural Network (CNN) image classification [11].
Success stories of ImageNet deep learning influenced usage of CNN image classification as an instrument for other techniques such as vector classifications. One of such techniques was suggested by Ignacio Oguiza as a method of encoding time series to Gramian Angular Field (GAF) images and classify GAF images based on fast.ai library [8].
In his study author translated Olive Oil time series data to images and demonstrated that image classification works well for time series classification. Specifically, author used transfer learning technique that allowed to train model on small sets of training data. Fine-tuning a network with transfer learning works usually much faster and has higher accuracy than training image classification model from scratch [21]. Unsupervised classification method that we introduce in this paper follows both techniques mentioned above: encoding time series to GAF images and using transfer learning for image classification. What is completely different in our method is application of these techniques to unlabeled data. Our model determines whether two time series vectors are similar or not similar to each other. Method is build on generating pairs of similar and different vectors, and then transforming pairwise vectors to GAF images for classification. The illustration on Figure 1 shows that by transforming self-reflected, mirror vectors to GAF images we are creating symmetric squares. On the contrary, by transforming pairs of different vectors to GAF images we create asymmetric squares (see training data examples on Figure 2). The process of approximately classifying images to symmetric or asymmetric classes acts as a dimensionality reduction and therefore it is expected to work better 1 than more complex image classification. To focus on symmetry we were inspired by geometric deep learning [4].
Unsupervised machine learning finds many types of unknown patterns in data and helps to find features which can be useful for categorization. It is easier to get unlabeled data from a computer than labeled data, which needs manual intervention. One of emerging areas of unsupervised machine learning is self-supervised learning where labeling is automated. In terms of terminology, it is still questionable what methods should be determined as unsupervised machine learning and what methods should be determined as self-supervised machine learning. As self-supervised machine learning technology mostly focuses on image or text classifications, we will call our novel techniques as an unsupervised machine learning. The pairwise vector entity classification method that we will introduce in this paper has limitations. In particular, we will show that this method is insufficient for entity pairs with two-way relationships and therefore it should be used only for entity pairs with one-way relationships. Considering these limitations, this method is applicable to entities that can be transformed to vectors such as time series, sequences, and many other types of entities that are well described as the list of *2Vec entities in GitHub paper [14].
We used this technique to solve one of the Natural Language Processing tasks in our technical blog [17]. We classified word pairs by word similarities to identify word pairs that are unexpected to be next to each other in documents or data streams.
For experiments in this study we will use data about history of daily temperatures in cities and will compare temperatures between city pairs. Based on this data we will show how this model works for classification of entity pairs with one-way relationships and why this model is not reliable for classification of entity pairs with two-way relationships.
Our solution for time series classification: • Convert daily temperature time series to vectors • Create reflected pairwise vectors by concatenating pairs of vectors • Transform pairwise vectors to GAF images • Classify images using CNN transfer learning • Examine how the model works for entity pairs with two-way and one-way relationships • Prove the hypothesis that the model is not reliable for entity pairs with two-way relationships • Demonstrate successful scenarios for entity pairs with one-way relationships.

Related Work
In the recent years deep learning techniques such as Long-Short Term Memory (LSTM) were successfully applied to time series forecasting applications [9]. CNN image classification techniques are very useful for time series classification especially when one-dimensional time series signals are transformed into two-dimensional matrices.
In study [20] authors suggest time series transformations to Gramian Angular Fields (GAF) and Markov Transition Fields (MTF) images. Techniques of transforming vectors to two-dimensional matrices allow machines to "visually" recognize signals and improve time series classification. In one of our studies we showed higher accuracy metrics of CNN image classification model with vector to GAF image transformation than accuracy metrics of models with plot pictures [16].
Self-supervised representation learning has made significant progress over the last years, almost reaching the performance of supervised baselines on many downstream tasks [2]. The closest to our method are discriminative self-supervised methods or contrastive methods that are based on direct comparison between training samples [5].
As our method categorizes pairs of entities to classes of similar pairs and not similar pairs, it is conceptually comparable with a combination of Siamese Nets [3] and self-supervised contrastive methods [5]. Siamese nets were first introduced in 1993 by Bromley and LeCun to solve signature verification as an image matching problem [3].
Recently, CNN deep learning techniques has shown great promise for the analysis of atmospheric imaging and for the estimation of tropical cyclones and their precursors. In one of the first studies such CNN deep learning was applied to climate pattern recognition problems [1]. However the survey study [13] of deep learning and machine learning techniques applied to hydrological processes, climate change and earth systems, shows that deep learning for climate data mining is still in the first stages of development, and the research is still progressing.

Methods
For data processing, model training and interpreting the results we will use the following steps: • Transform climate raw data to embedded vectors Data preparation, training and interpretation techniques are described in details in our technical blog [19].

Transform Time Series Data to Pairwise Vectors
For unsupervised entity pair classification we will use the following steps: • Concatenate entity vectors with themselves by reversing the second vector and label such vector pairs as 'same' • Concatenate vectors of different entities by reversing the second vector and label such vector pairs as 'different'.

Transform Pairwise Vectors to GAF Images
As a method of vector to image translation we will use Gramian Angular Field (GAF), a polar coordinate transformation based technique [20].

Train CNN Image Classification Models
To deal with comparatively small set of training data, instead of training the model from scratch, we will follow ResNet-50 transfer learning: load the results of model trained on images from the ImageNet database and fine tune it with data of interest [8].
Fast.ai CNN transfer learning image classification can be used for both supervised and unsupervised machine learning. Python code for transforming vectors to GAF images and fine tuning ResNet-50 is described in fast.ai forum [7].

Use Results of Trained Models
To calculate how similar are vectors to each other we will combine them as pairwise vectors and transform to GAF images. Then we will run GAF images through trained image classification model. If probability of getting to the 'same' class is higher than 0.5, we will consider the vector pair as similar, otherwise as not similar. Also if 'same' probability is high (for example, more then 0.8) the pair can be considered as very similar.

Transform Raw Data to Embedded Vectors
As raw data for this study we will use average daily temperatures in Celsius degrees for time period from January 1, 1980 to September 30, 2020 for 1000 most populous cities in the world taken from kaggle.com: "Temperature History of 1000 cities 1980 to 2020" [10].
To experiment with time series classifications, we will convert raw data to data set embedded vectors of the length 365 for average daily temperatures by city and by year and label vectors as 'city-year'.
The process of transforming raw data to vectors and python code are described in our technical blog [18]. Data preparation for pairwise vector method, model training and interpretation techniques are described in details in another post of our technical blog [19].

Transformation of Pairwise Vectors to GAF Images
For training data we will create 'same' class of symmetric images and 'different' class of asymmetric images. The Figures 1 and 2 illustrate that when converting self-reflected, mirror vectors to GAF images, we are getting symmetric images. Otherwise, when converting pairs of different vectors to GAF images, we are getting asymmetric images.
• For the 'same' class we will combine vectors with themselves reversing second vectors.
• For the 'different' class we will combine random pairs of vectors with temperatures of different years and different cities with reversed second vectors.

'Same' Class: Coalesce Vectors with Reversed Themselves
To generate training data images for the 'same' class we will combine vectors with reversed themselves. For each vector we will create a label 'cityyear'. For vector pairs we will combine these labels to 'city-year-city-year' labels and will use these labels as file names. Please see Figure 3 as an example of 'same' class images. This figure illustrates that self-reflected mirror vector and corresponding GAF image are symmetric.

'Different' Class: Coalesce Vectors with Reversed Second Vectors
To generate training data images for the 'different' class we will follow these steps: • To randomly select different pairs of vectors we will shuffle first vectors and reverse second vectors • The first vectors we will label as 'city1-year1' and second (reversed) vectors as 'city2-year2' • Than we will concatenate vector pairs and label pairwise vectors as 'city1-year1-city2-year2' • We will mark image class type as 'different'. Please see Figure 4 as an example of 'different' class images. This figure illustrates that concatenated different vectors and corresponding GAF image are asymmetric. Image transformation code is described in fast.ai forum [15] and CNN classification code for fast.ai transfer learning method is described in [6].

CNN Image Classification
To estimate the results we will calculate accuracy metrics as the proportion of the total number of predictions that were correct. The training model accuracy metric was about 96.5 percent.

Entity Pairs with Two-way Relationships
GAF images are based on polar coordinates, therefore transforming pairwise vectors to GAF images might generate inconsistent results for turned around entity pairs. To prove this hypothesis we will select 66 cities from continental West Europe with daily temperature data for 1992. For all city pairs we will create pairwise vectors in both directions, i.e. Paris -Berlin and Berlin -Paris. Then we will transform vectors to GAF images and run these images through the trained model to calculate their 'same' or 'different' probabilities.
As the results of using this model interpretation, most of city pairs in both directions have 'different' or 'same' metric. Out of the total number of city pairs (4290 city pairs) about 85 percent (3660 city pairs) are consistent. However about 15 percent of city pairs (630 city pairs) have inconsistent results. Please look at Table 1 for inconsistent city pair examples. Some of city pairs are located nearby, like Turin (Italy) and Monaco (Monaco) and some of city pairs are located far from each other, like Helsinki (Finland) and Bern (Switzerland).

Entity Pairs with One-way Relationships
We proved the hypothesis that pairwise vector classification model is not reliable for similarity prediction of pairs with two-way relationships and therefore this model should be used only for classification of entity pairs with one-way relationships.  Here we will show two scenarios of using this model to compare daily temperatures of pairs with one-way relationships. For the first scenario, we will calculate average vector of all yearly temperatures vectors for cities in Western Europe and compare it with yearly temperature vectors for all cities. For the second scenario we will determine a city located in the center of Western Europe and compare this city temperatures for years 2008 and 2016 with other cities.

Compare City, Year Temperature Vectors with Average of All Yearly Temperatures
To find cities with yearly temperatures similar to average temperatures for years from 1980 to 2019 in Western Europe we will calculate average time series for 2640 daily temperature time series (40 years and 66 cities).
As average of average temperature vectors provides a very smooth line, we don't expect that many city-year temperature vectors would be similar to it. In fact only 33 of city-year temperature time series have higher than 0.5 'same' probabilities and 22 pairs have probabilities higher than 0.65.
It is interesting that most of cities with high probabilities to be similar to average temperature are located on Mediterranean Sea, not far from each other. Here is a clockwise city list: Marseille (France), Nice (France), Monaco (Monaco), Genoa (Italy), Rome (Italy), Naples (Italy), and Salerno (Italy).

Compare City, Year Temperature Vectors with Central City
Here is another scenario of using pairwise vector model to compare daily temperatures of pairs with one-way relationships. From the Western Europe city list we selected the most centrally located city -Stuttgart (Germany) and created pairwise vectors by concatenating temperature vector pairs other city, Stuttgart for the years 2008 and 2016. Then we transformed pairwise vectors to GAF images and analyzed 'same' or 'different' probabilities by running images through our trained model. Based on the model, for both years cities located close to Stuttgart had high probability of similar temperature vectors and cities located far from Stuttgart had high probabilities of different temperature vectors. Please see Figure 5 as an example of two cities that had very different temperatures in both 2008 and 2016: Notice that in both years Stockholm (Sweden) was on the 'same' side and Nice (France) was on 'different' side. So if we would use probability 0.5 as a threshold, we would consider Stockholm daily temperatures as similar and Nice daily temperatures as not similar to Stuttgart daily temperatures for years 2008 and 2016.

Conclusion
In this study we introduced a novel unsupervised time series classification model. This model is embedding entities to vectors and combining entity pairs to pairwise vectors. Pairwise vectors are transformed to Gramian Angular Fields (GAF) images and GAF images are classified to symmetric or asymmetric classes using transfer learning CNN image classification.
We examined how this model works for entity pairs with two-way and one-way relationships and indicated that it is not reliable for classification of entity pairs with two-way relationships.
Based on climate data of cities from West Europe we demonstrated two successful scenarios for entity pairs with one-way relationships. In one scenario we selected centrally located city and found cities with similar and dissimilar daily temperature vectors. In another scenario we showed that European cities with the most smooth climate are located on Mediterranean Sea.

Broader Impact
There are several interesting directions in which this work could be extended. First, daily temperature time series data used in this study has spatial city location data. Projecting time series images to geographical locations will support fine-grained pattern detection. This practice is applicable to data mining scenarios where data that is a combination of spatial data and embedded vectors such as Electroencephalography pattern recognition [16].
Second, the time series classification approach of this study is implemented through transforming time series pairs to pairwise vectors and vectors to symmetric or asymmetric GAF images. In addition to time series, this process can be applied to a variety of embeddable entities such as words, documents, images, videos, etc. [14]. For example, this model can be used as unsupervised outlier detection in finding stock price time series that are very different from average stock prices.
Third, pairwise vectors model trained on symmetric and asymmetric GAF images can be train on some data domain and used for other data domains. For example, the model can be trained on time series data and used for word similarity classification.
Finally, the 'same' probability metric of pairwise vectors can be used to measure differences between vectors instead of measuring them typically through cosine similarities. Furthermore, through this metric direct graphs can be built for graph mining.