APPROACH FOR MINIMIZATION OF PHONEME GROUPS IN AUTHORSHIP ATTRIBUTION

Authors

  • Iryna Khomytska
  • Vasyl Teslyuk
  • Iryna Bazylevych
  • Inna Shylinska

DOI:

https://doi.org/10.47839/ijc.19.1.1693

Keywords:

authorship attribution, authorship identification capability of a phoneme group, average frequency, statistical method, phonological level, Information Technology tools.

Abstract

The developed mathematical support for authorship attribution software includes a combination of statistical methods (Student’s t-test, Kolmogorov-Smirnov’s test) and a statistical model for determining significant differences between styles. The combination of statistical methods allows us to enhance test validity of authorship attribution by obtaining the same results by the two methods applied. The model developed makes it possible to identify a consonant phoneme group with high style identification capability. The phoneme position in a word is taken into account. The greater number of significant differences is, the higher authorship identification capability of the phoneme group is. The developed system software is based on the algorithms of the used combination of methods and statistical model. The Java programming language provides platform independence. The minimized number of consonant phoneme groups makes the process of style and authorship attribution more automated. The obtained results of comparisons of the scientific, belles-lettres, conversational and newspaper styles are presented. The data obtained allows us to assert that the used combination of methods and the developed statistical model improve test validity of style and authorship attribution.

References

E. A. Stamatatos, “Survey of modern attribution methods,” Journal of the Association for Information Science and Technology, vol. 60, pp. 538-556, 2009.

Neoneuro. Authorship attribution, [Online]. Available at: https://neoneuro.com/ru/products/authorship-attribution.

Authorship attribution, [Online]. Available at: http:www.aicbt.com/authorship-attribution/online-software.

R. H. Baayen, Word Frequency Distributions, Springer Netherlands eBook. DOI 10.1007/978-94-010-0844-0, 2001.

P. Juala, “Authorship attribution,” Foundations and Trends(R) in Information Retrieval, vol. 3, Boston-Delft, pp. 233-334, 2008.

Sh. Argamon, M. Koppel, J. Pennebaker, J. Schler, “Automatically profiling the author of an anonymous text,” Communications of the ACM, vol. 52, issue 2, pp. 119-123, 2009.

M. Koppel, J. Schler, Sh. Argamon, “Computational methods in authorship attribution,” Journal of the Association for Information Science and Technology, vol. 60, issue 1, pp. 9-26, 2009.

R.N.V. Ganapathi, Ch. Sadhvi, P. Tejaswini, Y. Mounica, “Style based authorship attributionon English editorial documents,” International Journal of Computer Applications, vol. 159, no. 4, pp. 5-8, 2017.

O. Granichin, L. Klebanov, D. Shalymov, Z. Volkovich, “Authorship attribution method based on KNN re-sampling approach,” Proceedings of the Elmar – International Symposium Electronics in Marine, 2015. [Online]. Available at: http://www.elmar-zadar.org/2015/technical_program/.

V. Keselj, F. Peng, N. Cercone, C. Thomas, “N-gram-based author profiles for authorship attribution,” Proceedings of the Conference Pacific Association for Computational Linguistics, PASLING, vol. 3, 2003, pp. 255-264.

V. Guillén Nieto, C. Vargas Sierra, M. Pardiño Juan, P. Martínez Barco, A. Suárez Cueto, “Exploring state-of-the-art software for forensic authorship identification,” International Journal of English Studies, vol. 8, no. 12008. pp. 1-28. [Online]. Available at: https://revistas.um.es/ijes/article/view/49071.

I. Khomytska, V. Teslyuk, “Mathematical methods applied for authorship attribution on the phonological level,” Proceedings of the XIVth Scientific and Technical Conference CSIT, Lviv, 2019, pp. 7–11.

T. Petmanson, “Authorship identification using correlations of frequent features,” Notebook for PAN at CLEF. Institute of Computer Science, University of Tartu, 2013. [Online]. Available a: http://ceur-ws.org/Vol-1179/CLEF2013wn-PAN-Petmanson2013.pdf.

W. Deng, A. Allahverdyan, “Stochastic model for phonemes uncovers an author-dependency of their usage,” 2016. [Online]. Available at: https://arxiv.org/pdf/1510.01315.pdf.

I. Khomytska, V. Teslyuk, “Authorship and style attribution by statistical methods of style differentiation on the phonological level,” ebook: Advances in Intelligent Systems and Computing III, Natalia Shakhovska editor, vol. 871, Lviv, 2018, pp. 105-118.

V. Vysotska, O. Kanishcheva, Y. Hlavcheva, “Authorship identification of the scientific text in Ukrainian with using the lingvometry methods,” Proceedings of the 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies, 2018, pp. 34-38

A. Abbasi, H. Chen, “Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace,” ACM Transactions on Information Systems, vol. 26, no. 2, article 7, 2008.

E. Stamatatos, N. Fakotakis, G. Kokkinakis, “Computer-based authorship attribution without lexical measures,” Computers and the Humanities, vol. 35, no. 2, pp. 193-214, 2001.

Y. Zhao, J. Zobel, “Searching with style: Authorship attribution in classic literature,” Proceedings of the thirtieth Australasian Conference on Computer Science, Australian Computer Society, Inc., vol. 62, 2007, pp. 59-68.

J. Burrows, “Delta: A measure of stylistic difference and a guide to likely authorship,” Literary and Linguistic Computing, vol. 17, issue 3, pp. 267-287, 2002.

R. Zheng, J. Li, H. Chen, Z. Huang, “A framework for authorship identification of online messages: Writing-style features and classification techniques,” Journal of the Association for Information Science and Technology, vol. 57, no. 3, pp. 378-393, 2006.

A. K. Mitropolskiy, Statistical Computation Technique, Moscow: Nauka, 1971, 576 p. (in Russian)

Downloads

Published

2020-03-31

How to Cite

Khomytska, I., Teslyuk, V., Bazylevych, I., & Shylinska, I. (2020). APPROACH FOR MINIMIZATION OF PHONEME GROUPS IN AUTHORSHIP ATTRIBUTION. International Journal of Computing, 19(1), 55-62. https://doi.org/10.47839/ijc.19.1.1693

Issue

Section

Articles