SIMULTANEOUS TOPOLOGICAL CATEGORICAL DATA CLUSTERING AND CLUSTER CHARACTERIZATION

Authors

  • Lazhar Labiod
  • Nistor Nistor Grozavu
  • Younès Bennani

DOI:

https://doi.org/10.47839/ijc.10.1.732

Keywords:

Topological learning, Relational Analysis, Categorical data, Features selection.

Abstract

In this paper we propose a new automatic learning model which allows the simultaneously topological clustering and feature selection for quantitative datasets. We explore a new topological organization algorithm for categorical data clustering and visualization named RTC (Relational Topological Clustering). Generally, it is more difficult to perform clustering on categorical data than on numerical data due to the absence of the ordered property in the data. The proposed approach is based on the self-organization principle of the Kohonen’s model and uses the Relational Analysis formalism by optimizing a cost function defined as a modified Condorcet criterion. We propose an iterative algorithm, which deals linearly with large datasets, provides a natural clusters identification and allows a visualization of the clustering result on a two dimensional grid. Thereafter, the statistical ScreeTest is used to detect relevant and correlated features (or modalities) for each prototype. This test allows to detect the most important variables in an automatic way without setting any parameters. The proposed approach was validated on variant real datasets and the experimental results show the effectiveness of the proposed procedure.

References

A. Asuncion and D.J. Newman. UCI Machine Learning Repository. [http://www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science. (2007).

Barbara Hammer, Alexander Hasenfuss, Fabrice Rossi and Marc Strickert. Topographic Processing of Relational Data. In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefeld, Germany. September 2007.

M. Cottrell and P. Letremy. Analyzing surveys using the Kohonen algorithm. Proc. ESANN 2003, Bruges, 2003, M.Verleysen Ed., Editions D Facto, Bruxelles, pp. 85-92.

E.W. Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classification. in Biometrics, (vol. 21), (1965), pp. 768-780.

T. Kohonen. Self-Organizing Maps. Springer Series in Information Sciences, vol 30, Springer.

M. Lebbah, F. Badran and S. Thiria. Topological map for binary data. in ESANN, 2000.

M. Lebbah, N. Rogovschi and Y. Bennani. BeSOM: Bernoulli on Self-Organizing Map. in International Joint Conference on Neural networks, IJCNN, August 2007.

M. Lebbah, Y. Bennani and N. Rogovschi. A Probabilistic Self-Organizing Map for Binary Data Topographic Clustering. in International Journal of Computational Intteligence and Applications, World Scientific Publishing Compagny. (Vol. 7), (No. 4), (2008). Pp. 363-383.

F. Leich, A. Weingessel and E. Dimitriadou. Competitive Learning for Binary Data. in Proc of ICANN’98, septembre 2-4. Springer Verlag, 1998.

P. Letremy. Traitement de donnees qualitatives par des algorithmes fondes sur l’algorithme de Kohonen. SAMOS-MATISSE UMR 8595, UniversiteЅ de Paris 1, 2005.

J.F. Marcotorchino. Relational analysis theory as a general approach to data analysis and data fusion, in Cognitive Systems with interactive sensors, 2006.

J.F. Marcotorchino, P. Michaud. Optimisation en analyse ordinale des donnees. (In Masson, 1978.)

J.F. Marcotorchino. L’analyse factorielle relationnelle: partie I et II. Etude du CEMAP, IBM France, (vol. MAP-03), (decembre 1991).

J.F. Marcotorchino. Dualite Burt-Condorcet: relation entre analyse factorielle des correspondances et analyse relationnelle. (Etude du CEMAP, IBM France, in l’analyse des correspondances et les techniques connexes. Springer 2000.)

Zighed D. A, Hacid H., Aupetit M. Topological Learning. Proceedings of Toplearn workshop of ISMIS,Prague, 2009.

Grozavu N., Bennani Y. and M. Lebbah. From variable weighting to cluster characterization in topographic unsupervised learning. IJCNN’09: Proceedings of the 2009 international joint conference on Neural Networks, ISBN 978-1-4244-3549-4, pages 609-614, Atlanta, Georgia, USA.

Guerif S. and Y. Bennani. Dimensionality reduction trough unsupervised features selection. International Conference on Engineering Applications of Neural Networks, Hellas, (2007).

L. Labiod, N. Grozavu and Y. Bennani. Relational Topological Clustering. WCCI 2010 IEEE World Congress on Computational Intelligence, IJCNN’10, July, 18-23, 2010 – CCIB, Barcelona, Spain. pp. 3493-3500.

A. John Lee and Michel Verleysen. Unsupervised Dimensionality Reduction: Overview and Recent Advances. WCCI 2010 IEEE World Congress on Computational Intelligence, IJCNN’10, July, 18-23, 2010 – CCIB, Barcelona, Spain. pp. 4163-4170.

G. Raiche, M. Riopel and J.G. Blais. Non graphical solutions for the Cattell’s scree test. In International Meeting of the Psychometric Society, IMPS 2006, HEC, Montreal, 2006.

Asim Roy, 2010, On NSF “open questions,” Some External Properties of the Brain as a Learning System and An Architecture for Autonomous Learning. WCCI 2010 IEEE World Congress on Computational Intelligence, IJCNN’10, July, 18-23, 2010 – CCIB, Barcelona, Spain, p.3159-3166.

M. Strickert, N. Sreenivasulu, S. Peterek, W. Weschke, H.P. Mock and U. Seiffert. Unsupervised Feature Selection for Biomarker Identification in Chromatography and Gene Expression Data. In ANNPR, (2006), pp. 274-285.

John G. Taylor. A Roadmap for Autonomous Adaptive Systems: The Brain-Guided Attention (BGA) System. WCCI 2010 IEEE World Congress on Computational Intelligence, IJCNN’10, July, 18-23, 2010 – CCIB, Barcelona, Spain, pp. 412-419.

N. Wiratunga, R. Lothian and S. Massie. Unsupervised Feature Selection for Text Data. In ECCBR, Lecture Notes in Computer Science, (v. 4106) (2006) pp. 340-354.

R. Cattell. The scree test for the number of factors. Multivariate Behavioral Research, 1: 245276, 1966.

Bennani Y. Adaptive weighting of pattern features during learning. International Joint Conference on Neural Networks, IEEE – IJCNN’99. (1999)

M. Bohanec, V. Rajkovic. Expert system for decision making. Sistemica 1(1) (1990) pp. 145-157.

Downloads

Published

2011-12-20

How to Cite

Labiod, L., Nistor Grozavu, N., & Bennani, Y. (2011). SIMULTANEOUS TOPOLOGICAL CATEGORICAL DATA CLUSTERING AND CLUSTER CHARACTERIZATION. International Journal of Computing, 10(1), 9-23. https://doi.org/10.47839/ijc.10.1.732

Issue

Section

Articles