APPROACH FOR MINIMIZATION OF PHONEME GROUPS IN AUTHORSHIP ATTRIBUTION
Keywords:authorship attribution, authorship identification capability of a phoneme group, average frequency, statistical method, phonological level, Information Technology tools.
AbstractThe developed mathematical support for authorship attribution software includes a combination of statistical methods (Student’s t-test, Kolmogorov-Smirnov’s test) and a statistical model for determining significant differences between styles. The combination of statistical methods allows us to enhance test validity of authorship attribution by obtaining the same results by the two methods applied. The model developed makes it possible to identify a consonant phoneme group with high style identification capability. The phoneme position in a word is taken into account. The greater number of significant differences is, the higher authorship identification capability of the phoneme group is. The developed system software is based on the algorithms of the used combination of methods and statistical model. The Java programming language provides platform independence. The minimized number of consonant phoneme groups makes the process of style and authorship attribution more automated. The obtained results of comparisons of the scientific, belles-lettres, conversational and newspaper styles are presented. The data obtained allows us to assert that the used combination of methods and the developed statistical model improve test validity of style and authorship attribution.
E. A. Stamatatos, “Survey of modern attribution methods,” Journal of the Association for Information Science and Technology, vol. 60, pp. 538-556, 2009.
Neoneuro. Authorship attribution, [Online]. Available at: https://neoneuro.com/ru/products/authorship-attribution.
Authorship attribution, [Online]. Available at: http:www.aicbt.com/authorship-attribution/online-software.
R. H. Baayen, Word Frequency Distributions, Springer Netherlands eBook. DOI 10.1007/978-94-010-0844-0, 2001.
P. Juala, “Authorship attribution,” Foundations and Trends(R) in Information Retrieval, vol. 3, Boston-Delft, pp. 233-334, 2008.
Sh. Argamon, M. Koppel, J. Pennebaker, J. Schler, “Automatically profiling the author of an anonymous text,” Communications of the ACM, vol. 52, issue 2, pp. 119-123, 2009.
M. Koppel, J. Schler, Sh. Argamon, “Computational methods in authorship attribution,” Journal of the Association for Information Science and Technology, vol. 60, issue 1, pp. 9-26, 2009.
R.N.V. Ganapathi, Ch. Sadhvi, P. Tejaswini, Y. Mounica, “Style based authorship attributionon English editorial documents,” International Journal of Computer Applications, vol. 159, no. 4, pp. 5-8, 2017.
O. Granichin, L. Klebanov, D. Shalymov, Z. Volkovich, “Authorship attribution method based on KNN re-sampling approach,” Proceedings of the Elmar – International Symposium Electronics in Marine, 2015. [Online]. Available at: http://www.elmar-zadar.org/2015/technical_program/.
V. Keselj, F. Peng, N. Cercone, C. Thomas, “N-gram-based author profiles for authorship attribution,” Proceedings of the Conference Pacific Association for Computational Linguistics, PASLING, vol. 3, 2003, pp. 255-264.
V. Guillén Nieto, C. Vargas Sierra, M. Pardiño Juan, P. Martínez Barco, A. Suárez Cueto, “Exploring state-of-the-art software for forensic authorship identification,” International Journal of English Studies, vol. 8, no. 12008. pp. 1-28. [Online]. Available at: https://revistas.um.es/ijes/article/view/49071.
I. Khomytska, V. Teslyuk, “Mathematical methods applied for authorship attribution on the phonological level,” Proceedings of the XIVth Scientific and Technical Conference CSIT, Lviv, 2019, pp. 7–11.
T. Petmanson, “Authorship identification using correlations of frequent features,” Notebook for PAN at CLEF. Institute of Computer Science, University of Tartu, 2013. [Online]. Available a: http://ceur-ws.org/Vol-1179/CLEF2013wn-PAN-Petmanson2013.pdf.
W. Deng, A. Allahverdyan, “Stochastic model for phonemes uncovers an author-dependency of their usage,” 2016. [Online]. Available at: https://arxiv.org/pdf/1510.01315.pdf.
I. Khomytska, V. Teslyuk, “Authorship and style attribution by statistical methods of style differentiation on the phonological level,” ebook: Advances in Intelligent Systems and Computing III, Natalia Shakhovska editor, vol. 871, Lviv, 2018, pp. 105-118.
V. Vysotska, O. Kanishcheva, Y. Hlavcheva, “Authorship identification of the scientific text in Ukrainian with using the lingvometry methods,” Proceedings of the 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies, 2018, pp. 34-38
A. Abbasi, H. Chen, “Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace,” ACM Transactions on Information Systems, vol. 26, no. 2, article 7, 2008.
E. Stamatatos, N. Fakotakis, G. Kokkinakis, “Computer-based authorship attribution without lexical measures,” Computers and the Humanities, vol. 35, no. 2, pp. 193-214, 2001.
Y. Zhao, J. Zobel, “Searching with style: Authorship attribution in classic literature,” Proceedings of the thirtieth Australasian Conference on Computer Science, Australian Computer Society, Inc., vol. 62, 2007, pp. 59-68.
J. Burrows, “Delta: A measure of stylistic difference and a guide to likely authorship,” Literary and Linguistic Computing, vol. 17, issue 3, pp. 267-287, 2002.
R. Zheng, J. Li, H. Chen, Z. Huang, “A framework for authorship identification of online messages: Writing-style features and classification techniques,” Journal of the Association for Information Science and Technology, vol. 57, no. 3, pp. 378-393, 2006.
A. K. Mitropolskiy, Statistical Computation Technique, Moscow: Nauka, 1971, 576 p. (in Russian)
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.