MULTIDIMENSIONAL SEQUENCE CLUSTERING WITH ADAPTIVE ITERATIVE DYNAMIC TIME WARPING

Multimedia sequence matching is an urgent problem nowadays for the field of artificial intelligence. Despite great progress in this field of computer science, huge data arrays require near to real time processing, which significantly limits applicable methods. In this paper, the authors make an attempt of time series approach modification and enhancement with orientation for non-stationary data of different length. The processing procedure is enriched by sequences alignment with iterative dynamic time warping which implies matching two temporal data segments. Computational complexity is reduced due to Kohonen self-organizing maps applied for the purpose of clustering. Mathematical presentation is given in scalar, vector and matrix forms in order to cover all the possible use cases. An example of video sequence processing with the novel approach is provided to show its efficiency. The proposed technique can also be successfully implemented for natural language and signal processing, bioinformatics and financial data analysis.


INTRODUCTION
With the rapid growth of data freely available in the Internet, matching multimedia content has recently become an urgent problem [1][2][3][4].This article analyses prerequisites for content matching with limitations of existing techniques.The prime attention is given to video sequence processing as the most complicated type of multimedia.But before any artificial intelligence method is applied, video sequence needs to be segmented and classified according to some criterion.
Aside from video, the problem of temporal sequence segmentation and clustering exists in many practical applications [5][6][7], and a great number of algorithms have been developed by now to cope with this problem [8][9][10][11][12][13].When the sequences under research are of different length, the situation becomes even more complicated and influences inability of applying traditional metrics for cluster analysis.In such cases, similarity measures and functions along with dimensionality reduction techniques are usually employed, among which are: Minkowski distances (including Euclidean and Manhattan distances), maximum distance, Mahalanobis distance, Hamming distance, Gaussian affinities, Cosine similarities, Jaccard index, crosscorrelation function, Landmarks similarity, Dynamic Time Warping (DTW), Longest Common Substring, spectral decomposition through Discrete Fourier Transform, Discrete Wavelet Transform, Singular Value Decomposition, Piecewise Aggregate / Constant / Linear Approximation, Chebyshev Polynomials, just to name a few [14][15][16].
The main challenges here are linked to multidimensionality and processing time.The former involves the curse of dimensionality and ambiguity of similarity measures for multidimensional feature space; and the latter should obviously tend to near real time.Dimensionality reduction without loosing essential information is the goal of any approach designed to cope with highdimensional time sequences.In this relation, DTW should be mentioned first of all [16,17].It enables evaluation of distance between any two sample series from a sequence of observations.Thus, consider two one-dimensional sequences of different length the distance between which is to be estimated.This distance should then be suitable for clustering with any of the known approaches.To do so, the distance matrix ( )  N M is considered in an arbitrary metric for all the sample elements ( ( ) ( )) d x k , y l 1 2 1 2   k , ,...,N ;l , ,...,M .So called 'warping path' is computed using this matrix.It is formed by the sequence L M N ,q , ,...,L , and it is a basis for similarity estimation between series X and Y .Warping path serves as a distance between the series, providing minimum to the following criterion [18]: Minimization of criterion ( 1) is performed with the procedure of dynamic programming based on recurrent relation , where ( ) D k ,l is a cumulative distance between sequence elements ( ), ( ) x k y l .

 DTW x, y D N ,M
This approach was expanded for multidimensional sequences ( ), ( )  N x k y l R in [19].Euclidean metric, here, is used as a distance between observations 2 2 ( ( ) ( ))= ( ) ( ) d x k , y l x k y l . In case of matrix signal processing, e.g., video sequence processing, when Frobenius norm can be easily introduced that generalizes the metric (2).The main challenges with DTW usage appear when long time series are to be analyzed, such as video sequences.Dynamic programming (and the 'curse of dimensionality' connected to it) influences tremendous computational complexity in this case, which makes this approach ineffective [16].With this in mind, modification of DTW approach is reasonable for processing of time series with sufficient length.

TIME SERIES MODEL DEVELOPMENT BASED ON ITERATIVE DYNAMIC TIME WARPING
Computational complexity can be significantly shrunk due to iterative dynamic time warping approach discussed in [20,21].It implies reduction of the analyzed series with the help of some estimates given to particular sequence intervals.
Thus, the mean values (1), ( 2), (3),..., ( 8), ( 1), ( 2), ( 3 x ,x ,..., x ,..., x , where each sequence has 1  g scores.The aforementioned approach is only applicable for situation when series properties remain unchanged at each of the formed intervals.The latter condition is not always provided for real video processing applications.That is why mean based clustering may turn ineffective.From the point of general time series clustering problem, the technique discussed above relates to feature based approaches.Approaches based on time series models seem to be more promising for the problems of non-stationary sequence processing [8].It should be mentioned here that the latter approaches are developed for the sequences of the same length.Therefore, it makes sense to merge time series model based approach with iterative dynamic time warping.
Thus far, consider a set of temporal sequences with different length, each of which is divided into p intervals, and these intervals may contain different number of observations in each series q x .Let us denote an observation of q -th series in r -th interval as , where  is the number of this observation in the interval, and this number changes from 1 to maximum length of this interval.Assume that each observation corresponds with the following linear adaptive model: where Model parameter setting is performed with the recursive least squares method: .(6) As a result of such data processing, each q -th series q x corresponds with   clustering is further performed on its basis.It should also be noted that series stationarity at partition regions influences qr b parameters tend to zero, i.e. turning back to iterative dynamic time warping.In case of multidimensional time vectors x R , the following model corresponds with each series: where qr A is   2  n -matrix of parameters to be estimated.
To configure model parameters (7), recursive algorithm of the following type can be used: .(8) Thus, each series q x corresponds with r matrices that can be merged into a complex ( 2 )  n g -matrix q A .It should be mentioned here that the relations ( 7) and ( 8) have been used for online video frame segmentation.
Finally, when we are dealing with problems, the following mathematical model can correspond with each sequence: where qr The model parameters ( 9) can be configured with the modified algorithm.To do so, the following model can correspond with the equation ( 9): where τ τ  B qr qr Least squares estimation can be written as follows for matrix qr A in (10): ( ) ( 1)+ ( )τ ( ) ( ) ( 1) τ ( )τ ( ) By analogy to the previous manipulations, let us introduce the following model: where Least squares estimation for the row-vector qr B in (10) can be written as follows: .
It can also be written in a recursive form: As a result, the series q x corresponds with ( 2 )  n g -matrix q A and   1  vg -vector T q B , which are further used for clustering.Hence, the object for one-dimensional series clustering is   A , and the object for multidimensional series clustering is ( 2 )  n g -matrices q A , and finally, the object for matrix series clustering is ( 2 )  n g -matrix q A and  

TIME SERIES CLUSTERING WITH KOHONEN SELF-ORGANIZING MAPS
In order to cluster the arrays of vectors q A , T q B (for one-dimensional and multidimensional sequences) and matrices q A (for multidimensional and matrix signals), we have used an approach based on centroid prototypes which implies partitioning of all the data array into m clusters with j с centroids, 1 2  j , ,...,m , needed to be set.Further, without loosing generality, vectors needed for clustering will be denoted as q B and matrices will be denoted as Clustering problem has been solved with such a type of neural networks as Kohonen self-organizing maps [22].First, address an issue of clustering (selforganizing) for vector signals q B , 1 2  q , ,...,Q .
Self-organizing procedure implies two consequential steps: competition and synapse adaptation.The procedure begins with rather arbitrary setting of centroids (0) which is nearest to the vector 1 B in the sense of distance (2), such that: This is the aim of competition, to set a winner Adaptation step assumes arrangement of the winning centroid location with the help of the following recursive relation (learning rule 'the winner takes all'): = if (0) is the winner, (0) for all the rest centroids.
Here, (1)  is a setting interval parameter that monotonically decreases during adaptation process, and it is usually specified according to empirical reasons.After q -th vector q B has been transferred for processing and the winner has been defined, adaptation process is implemented according to the relation: is the winner, ( 1) for all the rest centroids.
The arrangement of location is performed until the whole sample collection and the adaptation step is implemented with the help of the following recursive relation: ( 1)+ ( )( ( 1)) ( )= if ( 1) is the winner, ( 1) for all the rest centroids.
it is easy to see that adaptation procedures (15) and (17) are structurally agreed, which enables using the relation (17) even for vector sequences, especially as Frobenius norm is a generalization of Euclidean one.

EXPERIMENTAL RESULTS OF VIDEO SEQUENCE SEGMENTATION
In order to analyze the efficiency of the proposed multidimensional time series clustering procedure, series of video sequences have been used as typical representatives for such kind of data.Fig. 1 shows an example of video sequence taken from 'Destroyed in Seconds' documentary cycle on the Discovery Channel.This sequence consists of several segments (500 frames total), illustrating a plane take-off with further crash-down that influenced combustion and explosion of an aircraft.A human expert can easily partition this video into 3 semantically meaningful consequential segments: runway acceleration phase shot from a single camera angle (frames from 1 to 90), take-off phase shot from another angle (frames from 91 to 266), crashdown phase shot from the third angle (frames from 277 to 500).Scene change is indicated in Fig. 1 with transition frames depicting the accident.Temporal video segmentation (partitioning into scenes) with the proposed approach is given by the plot provided in Fig. 2. It can be clearly seen that the events of acceleration and take-off are distinctly identified by value peaks in the plot at corresponding moments of time.Thus, we may conclude that the proposed approach provides possibilities to perform video sequence clustering.In connection with this, it should be noted that transition between the first two segments are easily identified despite significant data changes are absent here, and the rest segment frames differ insignificantly.At the same time, according to the plot, the third segment is not that homogeneous as the first two.It is mostly due to the fact that the plane crash-down only took place at the beginning, after a while it turned into a serious fire hazard as shown in frame 450 (see fig. 1).According to this, value peaks presented in the interval from 267 frame to 500 frame can be categorized as noise outliers, otherwise this segment of fire hazard may be partitioned into some more sub-segments indicating inflammation, small fire and serious hazard respectively (depending on the problem statement).Though, additional analysis of the latter segment using the proposed approach may help identifying more precise segment boundaries.

CONCLUSION
Computationally simple procedure for time series clustering has been proposed in this paper (in scalar, vector and matrix forms).It grounds on the use of both modified iterative dynamic time warping and Kohonen self-organizing clustering map.In contrast to the early works [12,13], the use of both proposed approaches allowed segmentation process in matrix form (more logical in terms of multidimensional data processing) with Frobenius norm, which made it possible to avoid the vectorization/devectorization process to increase the speed, which is especially important in video processing.It should also be noted that the experiments were conducted on more than 20 different videos from 'Destroyed in Seconds' documentary cycle and as a result of clustering, shots with content-based similar characteristics were obtained.The given here approach provides means for huge arrays of highly dimensional data processing within acceptable time.The novel approach was designed for video sequence processing, though it can also be successfully used in natural language and signal processing, DNA/RNA and financial data analysis just to name a few.
be written in a recursive form:

1 2 Q
B ,B ,...,B exists.If centroids have not been stabilized until then with the following inequality fulfillment 1 ( )

Figure 1 -
Figure 1 -Spatial and temporal segmentation of video frames

Figure 2 -
Figure 2 -Plot of temporal segmentation for the initial video sequence