PROGRAMMING STYLE ON SOURCE CODE PLAGIARISM AND COLLUSION DETECTION
Keywords:source code, plagiarism and collusion, similarity detection, programming style, computing education.
AbstractThis paper utilises programming style on a source code plagiarism and collusion detection to both capture obvious attempts of such academic dishonesty (which characteristics are ignored on most detection techniques) and prioritise non-coincidental similarity to the coincidental one (as only the former can raise suspicion). The technique relies on pairwise programming style similarity to deal with the former and dishonesty probability (how significant is the programming style change between the author’s current submission and previous submissions) to deal with the latter. According to our evaluation, programming style similarity can increase precision since when a code is copied, the programming style can be unconsciously shared (especially for novice students). Dishonesty probability increases not only precision but also recall, f-score, and the resulted similarity degree of suspected pairs; the copied code commonly has different programming style in comparison with the student’s usual style (captured from previous submissions). Our detection technique is comparable to a common technique in academia except that it takes longer processing time as more hints are generated and considered.
M. Joy, G. Cosma, J. Y.-K. Yau, and J. Sinclair, “Source code plagiarism – a student perspective,” IEEE Transactions on Education, vol. 54, no. 1, pp. 125–132, Feb. 2011.
R. Fraser, “Collaboration, collusion and plagiarism in computer science coursework,” Informatics in Education, vol. 13, no. 2, pp. 179–195, Sep. 2014.
T. Lancaster, “Academic integrity for computer science instructors,” in Higher Education Computer Science, Cham: Springer International Publishing, 2018, pp. 59–71.
M. Devlin, “Policy, Preparation, and Prevention: Proactive minimization of student plagiarism,” Journal of Higher Education Policy and Management, vol. 28, no. 1, pp. 45–58, Mar. 2006.
J. Sheard, Simon, M. Butler, K. Falkner, M. Morgan, and A. Weerasinghe, “Strategies for maintaining academic integrity in first-year computing courses,” Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, 2017, pp. 244–249.
J. Sheard, A. Carbone, and M. Dick, “Determination of factors which impact on IT students’ propensity to cheat,” Proceedings of the 5th Australasian conference on Computing education - Volume 20, 2003, pp. 119–126.
Simon et al., “Negotiating the maze of academic integrity in computing education,” Proceedings of the 2016 ITiCSE Working Group Reports, 2016, pp. 57–80.
L. Prechelt, G. Malpohl, and M. Philippsen, “Finding plagiarisms among a set of programs with JPlag,” Journal of Universal Computer Science, vol. 8, no. 11, pp. 1016–1038, 2002.
G. Cosma and M. Joy, “An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis,” IEEE Transactions on Computers, vol. 61, no. 3, pp. 379–394, Mar. 2012.
F.-P. Yang, H. C. Jiau, and K.-F. Ssu, “Beyond plagiarism: an active learning method to analyze causes behind code-similarity,” Computers and Education, vol. 70, pp. 161–172, Jan. 2014.
U. Inoue and S. Wada, “Detecting plagiarisms in elementary programming courses,” Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012, pp. 2308–2312.
L. Sulistiani and O. Karnalim, “ES-Plag: efficient and sensitive source code plagiarism detection tool for academic environment,” Computer Applications in Engineering Education, vol. 27, no. 1, pp. 166–182, 2019.
K. J. Ottenstein, “An algorithmic approach to the detection and prevention of plagiarism,” ACM SIGCSE Bulletin, vol. 8, no. 4, pp. 30–41, Dec. 1976.
A. Parker and J. O. Hamblen, “Computer algorithms for plagiarism detection,” IEEE Transactions on Education, vol. 32, no. 2, pp. 94–99, 1989.
M. J. Wise, “Yap3: improved detection of similarities in computer program and other texts,” Proceedings of the 27th SIGCSE Technical Symposium on Computer Science Education, 1996, vol. 28, no. 1, pp. 130–134.
A. M. Bejarano, L. E. García, and E. E. Zurek, “Detection of source code similitude in academic environments,” Computer Applications in Engineering Education, vol. 23, no. 1, pp. 13–22, Jan. 2015.
J.-H. Ji, G. Woo, and H.-G. Cho, “A source code linearization technique for detecting plagiarized programs,” Proceedings of the 12th Annual ITiCSE Conference on Innovation and Technology in Computer Science Education, 2007, pp. 73-77.
J.-S. Lim, J.-H. Ji, H.-G. Cho, and G. Woo, “Plagiarism detection among source codes using adaptive local alignment of keywords,” Proceedings of the 5th International Confernece on Ubiquitous Information Management and Communication, 2011, p. 24.
J.-H. Ji, G. Woo, and H.-G. Cho, “A plagiarism detection technique for Java program using bytecode analysis,” Proceedings of the 3rd International Conference on Convergence and Hybrid Information Technology, 2008, pp. 1092–1098.
O. Karnalim, “Detecting source code plagiarism on introductory programming course assignments using a bytecode approach,” Proceedings of the 10th International Conference on Information & Communication Technology and Systems, 2016, pp. 63–68.
O. Karnalim, “A low-level structure-based approach for detecting source code plagiarism,” IAENG International Journal of Computer Science, vol. 44, no. 4, pp. 501–522, 2017.
W. B. Croft, D. Metzler, and T. Strohman, Search Engines : Information Retrieval in Practice. Addison-Wesley, 2010.
E. Flores, A. Barrón-Cedeño, L. Moreno, and P. Rosso, “Cross-language source code re-use detection using Latent Semantic Analysis,” Journal of Universal Computer Science, vol. 21, no. 13, pp. 1708–1725, 2015.
F. Ullah, J. Wang, M. Farhan, S. Jabbar, Z. Wu, and S. Khalid, “Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology,” Multimedia Tools and Applications, pp. 1-18, Mar. 2018.
C. Arwin and S. M. M. Tahaghoghi, “Plagiarism detection across programming languages,” Proceedings of the 29th Australasian Computer Science Conference, 2006, pp. 277-286.
M. Mozgovoy, S. Karakovskiy, and V. Klyuev, “Fast and reliable plagiarism detection system,” Proceedings of the 37th Annual Frontiers in Education Conference, 2007, pp. 11–14.
S. Burrows, S. M. M. Tahaghoghi, and J. Zobel, “Efficient plagiarism detection for large code repositories,” Software: Practice and Experience, vol. 37, no. 2, pp. 151–175, 2007.
D. Fu, Y. Xu, H. Yu, and B. Yang, “WASTK: a weighted abstract syntax tree kernel method for source code plagiarism detection,” Scientific Programming, vol. 2017, pp. 1–8, Feb. 2017.
H.-J. Song, S.-B. Park, and S. Y. Park, “Computation of program source code similarity by composition of parse tree and call graph,” Mathematical Problems in Engineering, vol. 2015, pp. 1–12, Apr. 2015.
C. Liu, C. Chen, J. Han, and P. S. Yu, “Gplag: detection of software plagiarism by program dependence graph analysis,” Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 872-881.
H. Kikuchi, T. Goto, M. Wakatsuki, and T. Nishino, “A source code plagiarism detecting method using alignment with abstract syntax tree elements,” Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2014, pp. 1–6.
L. Wang, L. Jiang, and G. Qin, “A search of verilog code plagiarism detection method,” Proceedings of the 13th International Conference on Computer Science & Education, 2018, pp. 1–5.
J.-Y. Kuo, H.-K. Cheng, and P.-F. Wang, “Program plagiarism detection with dynamic structure,” Proceedings of the 7th International Symposium on Next Generation Electronics, 2018, pp. 1–3.
M. Joy and M. Luck, “Plagiarism in programming assignments,” IEEE Transactions on Education, vol. 42, no. 2, pp. 129–133, 1999.
J. Petrik, D. Chuda, and B. Steinmüller, “Source code plagiarism detection: the Unix way,” Proceedings of the 15th International Symposium on Applied Machine Intelligence and Informatics, 2017, pp. 467–472.
M. El Bachir Menai and N. S. Al-Hassoun, “Similarity detection in Java programming assignments,” Proceedings of the 5th International Conference on Computer Science & Education, 2010, pp. 356–361.
A. E. Budiman and O. Karnalim, “Automated hints generation for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment,” Computers, vol. 8, no. 1, p. 11, Feb. 2019.
P. Vamplew and J. Dermoudy, “An anti-plagiarism editor for software development courses,” Proceedings of the 7th Australasian Conference on Computing Education, 2010, pp. 83–90.
T. Parr, The definitive ANTLR 4 reference. Pragmatic Bookshelf, 2013.
O. Karnalim and R. Mandala, “Java Archives Search Engine using Byte Code as Information Source,” Proceedings of the 2014 International Conference on Data and Software Engineering (ICODSE), 2014, pp. 1–6.
T. Mitchell, Machine Learning. McGraw-Hill Education, 1997.
I. H. Witten, E. Frank, and M. A. Hall, Data mining : practical machine learning tools and techniques. Morgan Kaufmann, 2011.
Y. D. Liang, Introduction to Java programming, comprehensive version (9th Edition). Pearson, 2013.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.