The Sound and Sight of Confidence: An Audiovisual ML Approach

Authors

  • Jyothsna AN
  • Pamela Vinitha Eric
  • M.S. Smitha Rao

Keywords:

Multimodality, video-audio fusion, Machine Learning, Gradient Boosting, Early Fusion

Abstract

Research shows that confidence, which is crucial in conversations, has gained importance in various fields. Researchers have proved that the speaker’s confidence can be gazed from face and as well as from their speech. This paper introduces a study on confidence level detection of speakers using multimodal AI approach which combines both video and audio modalities. With the development in the field of technology, capturing these cues has improved significantly. Obtaining the cues both from video and audio is crucial in the multimodality approach. We extracted features such as head pose, gaze direction as video features and spectral and prosodic features as audio features. With careful evaluation, we have achieved a notable accuracy of 85% with Gradient Boosting Machines along with AOC of 0.98 which emphasis on multimodality approach on fused test set. The findings highlight the importance of integrating visual and auditory cues to improve the accuracy of confidence level detection systems, with potential applications in education, public speaking, and virtual communication platforms.

References

W. J. Seiler, & M. L. Beall, Communication: Making connections, 6th. Ed., Boston: Allyn & Bacon, 2005.

R. N. Rodrigues, et al., “Robustness of multimodal biometric fusion methods against spoof attacks,” Journal of Visual Language and Computing, vol. 20, issue 3, pp. 169-179, 2009, https://doi.org/10.1016/j.jvlc.2009.01.010.

U. Gawande and Y. Golhar, “Biometric security system: a rigorous review of unimodal and multimodal biometrics techniques,” Int. J. Biometrics, vol. 10, no. 2, pp. 142–175, 2018. https://doi.org/10.1504/IJBM.2018.10012749.

D. Conrad and R. Newberry, “24 business communication skills: attitudes of human resource managers versus business educators,” Am. Commun. J., vol. 13, pp. 4–23, 2011.

G. Mohammadi and A. Vinciarelli, “Towards a technology of nonverbal communication: vocal behavior in social and affective phenomena,” in Affective Computing and Interaction: Psychological, Cognitive, and Neuroscientific Perspectives, D. Gokcay and G. Yildirim, Eds. IGI Global, 2012. https://doi.org/10.4018/978-1-61692-892-6.ch007.

M. Cai and J. Tanaka, “Go together: providing nonverbal awareness cues to enhance co-located sensation in remote communication,” Hum.-Centric Comput. Inf. Sci., vol. 9, p. 19, 2019. https://doi.org/10.1186/s13673-019-0180-y.

J. B. Walther, “Theories of computer-mediated communication and interpersonal relations,” in The Handbook of Interpersonal Communication, M. L. Knapp and J. A. Daly, Eds., Thousand Oaks, CA: SAGE, 2011, pp. 443–479.

S. Nestler and M. D. Back, “Applications and extensions of the lens model to understand interpersonal judgments at zero acquaintance,” Curr. Dir. Psychol. Sci., vol. 22, pp. 374–379, 2013. https://doi.org/10.1177/0963721413486148.

E. Cech, B. Rubineau, S. Silbey, and C. Seron, “Professional role confidence and gendered persistence in engineering,” Am. Sociol. Rev., vol. 76, pp. 641–666, 2011, https://doi.org/10.1177/0003122411420815.

P. D. Bennett and G. D. Harrell, “The role of confidence in understanding and predicting buyers’ attitudes and purchase intentions,” J. Consum. Res., vol. 2, pp. 110–117, 1975, https://doi.org/10.1086/208622.

F. Meyniel, M. Sigman, and Z. F. Mainen, “Confidence as Bayesian probability: From neural origins to behavior,” Neuron, vol. 88, pp. 78–92, 2015. https://doi.org/10.1016/j.neuron.2015.09.039.

T. O. Nelson, and L. Narens, “Metamemory: a theoretical framework and some new findings,” in The Psychology of Learning and Motivation, ed G. H. Bower (San Diego, CA: Academic Press), 125–173, 1990. https://doi.org/10.1016/S0079-7421(08)60053-5.

A. Boduroglu, A. I. Tekcan, and A. Kapucu, “The relationship between executive functions, episodic feeling-of-knowing and confidence judgements,” J. Cogn. Psychol., vol. 26, pp. 333–345, 2014. https://doi.org/10.1080/20445911.2014.891596.

W. G. Moons, J. R. Spoor, A. E. Kalomiris, and M. K. Rizk, “Certainty broadcasts risk preferences: verbal and nonverbal cues to risk-taking,” J. Nonverbal Behav., vol. 37, pp. 79–89, 2013. https://doi.org/10.1007/s10919-013-0146-0.

R. J. Cramer, S. L. Brodsky, and J. Decoster, “Expert witness confidence and juror personality: their impact on credibility and persuasion in the courtroom,” J. Am. Acad. Psychiatry Law, vol. 37, pp. 63–74, 2009.

R. J. Cramer, C. T. Parrott, B. O. Gardner, C. H. Stroud, M. T. Boccaccini, and M. P. Griffin, “An exploratory study of meta-factors of expert witness persuasion,” J. Individ. Differ., vol. 35, pp. 1–11, 2014. https://doi.org/10.1027/1614-0001/a000123.

T. DeGroot, J. Gooty, “Can nonverbal cues be used to make meaningful personality attributions in employment interviews?” J. Bus. Psychol., vol. 24, pp. 179–192, 2009. https://doi.org/10.1007/s10869-009-9098-0.

T. DeGroot, and S. J. Motowidlo, “Why visual and vocal interview cues can affect interviewers’ judgments and predict job performance,” J. Appl. Psychol., vol. 84, pp. 986–993, 1999. https://doi.org/10.1037/0021-9010.84.6.986.

J. Brosy, A. Bangerter, and E. Mayor, “Disfluent responses to job interview questions and what they entail,” Discourse Process, vol. 53, pp. 371–391, 2016. https://doi.org/10.1080/0163853X.2016.1150769.

S. A. J. Birch, N. Akmal, and K. L. Frampton, “Two-year-olds are vigilant of others’ non-verbal cues to credibility,” Dev. Sci., vol. 13, pp. 363–369, 2010. https://doi.org/10.1111/j.1467-7687.2009.00906.x.

F. Meyniel, M. Sigman, and Z. F. Mainen, "Confidence as Bayesian probability: From neural origins to behavior," Neuron, vol. 88, pp. 78–92, 2015, https://doi.org/10.1016/j.neuron.2015.09.039.

T. O. Nelson and L. Narens, "Metamemory: a theoretical framework and some new findings," in The Psychology of Learning and Motivation, G. H. Bower, Ed., San Diego, CA: Academic Press, 1990, pp. 125–173, https://doi.org/10.1016/S0079-7421(08)60053-5.

A. Boduroglu, A. I. Tekcan, and A. Kapucu, "The relationship between executive functions, episodic feeling-of-knowing and confidence judgements," J. Cogn. Psychol., vol. 26, pp. 333–345, 2014, https://doi.org/10.1080/20445911.2014.891596.

W. G. Moons, J. R. Spoor, A. E. Kalomiris, and M. K. Rizk, "Certainty broadcasts risk preferences: verbal and nonverbal cues to risk-taking," J. Nonverbal Behav., vol. 37, pp. 79–89, 2013, https://doi.org/10.1007/s10919-013-0146-0.

R. J. Cramer, S. L. Brodsky, and J. Decoster, "Expert witness confidence and juror personality: their impact on credibility and persuasion in the courtroom," J. Am. Acad. Psychiatry Law, vol. 37, pp. 63–74, 2009.

R. J. Cramer, C. T. Parrott, B. O. Gardner, C. H. Stroud, M. T. Boccaccini, and M. P. Griffin, "An exploratory study of meta-factors of expert witness persuasion," J. Individ. Differ., vol. 35, pp. 1–11, 2014, https://doi.org/10.1027/1614-0001/a000123.

T. DeGroot and J. Gooty, "Can nonverbal cues be used to make meaningful personality attributions in employment interviews?" J. Bus. Psychol., vol. 24, pp. 179–192, 2009, https://doi.org/10.1007/s10869-009-9098-0.

T. DeGroot and S. J. Motowidlo, "Why visual and vocal interview cues can affect interviewers’ judgments and predict job performance," J. Appl. Psychol., vol. 84, pp. 986–993, 1999, https://doi.org/10.1037/0021-9010.84.6.986.

J. Brosy, A. Bangerter, and E. Mayor, "Disfluent responses to job interview questions and what they entail," Discourse Process., vol. 53, pp. 371–391, 2016, https://doi.org/10.1080/0163853X.2016.1150769.

S. A. J. Birch, N. Akmal, and K. L. Frampton, "Two-year-olds are vigilant of others’ non-verbal cues to credibility," Dev. Sci., vol. 13, pp. 363–369, 2010, https://doi.org/10.1111/j.1467-7687.2009.00906.x.

X. Jiang and M. D. Pell, "On how the brain decodes vocal cues about speaker confidence," Cortex, vol. 66, pp. 9–34, 2015, https://doi.org/10.1016/j.cortex.2015.02.002.

X. Jiang, R. Sanford, and M. D. Pell, "Neural systems for evaluating speaker (un)believability," Hum. Brain Mapp., vol. 38, pp. 3732–3749, 2017, https://doi.org/10.1002/hbm.23630.

Y. Mori and M. D. Pell, "The look of (un)confidence: Visual markers for inferring speaker confidence in speech," Front. Commun., vol. 4, p. 63, 2019, https://doi.org/10.3389/fcomm.2019.00063.

B. Rimé and L. Schiaratura, "Gesture and speech," in Studies in Emotion & Social Interaction: Fundamentals of Nonverbal Behavior, R. S. Feldman and B. Rimé, Eds., New York, NY: Cambridge University Press, 1991, pp. 239–281.

R. M. Krauss, Y. Chen, and P. Chawla, "Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us?," in Advances in Experimental Social Psychology, M. Zanna, Ed., San Diego, CA: Academic Press, 1996, pp. 389–450, https://doi.org/10.1016/S0065-2601(08)60241-5.

M. H. Goodwin and C. Goodwin, "Gesture and coparticipation in the activity of searching for a word," Semiotica, vol. 62, pp. 51–76, 1986, https://doi.org/10.1515/semi.1986.62.1-2.51.

J. Bavelas and J. Gerwing, "Conversational hand gestures and facial displays in face-to-face dialogue," in Social Communication, K. Fiedler, Ed., New York, NY: Psychology Press, 2007, pp. 283–308.

P. Ekman and W. V. Friesen, Manual for the Facial Action Coding System, Palo Alto, CA: Consulting Psychologists Press, 1978, https://doi.org/10.1037/t27734-000.

E. Krahmer and M. Swerts, "How children and adults produce and perceive uncertainty in audiovisual speech," Lang. Speech, vol. 48, pp. 29–53, 2005, https://doi.org/10.1177/00238309050480010201.

J. A. Hall, "Voice tone and persuasion," J. Personal. Soc. Psychol., vol. 38, pp. 924–934, 1980, https://doi.org/10.1037/0022-3514.38.6.924.

N. D. Cook, Tone of Voice and Mind: The Connections between Intonation, Emotion, Cognition, and Consciousness, John Benjamins Publishing, 2002. https://doi.org/10.1075/aicr.47.

S. Chanda, K. Fitwe, G. Deshpande, B. W. Schuller, and S. Patel, "A deep audiovisual approach for human confidence classification," Front. Comput. Sci., vol. 3, p. 674533, 2021, https://doi.org/10.3389/fcomp.2021.674533.

X. Jiang and M. D. Pell, "The sound of confidence and doubt," Speech Commun., vol. 88, pp. 106–126, 2017, https://doi.org/10.1016/j.specom.2017.01.011.

C. E. Kimble and S. D. Seidel, "Vocal signs of confidence," J. Nonverbal Behav., vol. 15, no. 2, pp. 99–105, 1991, https://doi.org/10.1007/BF00998265.

K. R. Scherer, H. London, and J. J. Wolf, "The voice of confidence: Paralinguistic cues and audience evaluation," J. Res. Personal., vol. 7, no. 1, pp. 31–44, 1973, https://doi.org/10.1016/0092-6566(73)90030-5.

A. B. Van Zant and J. Berger, "How the voice persuades," J. Personal. Soc. Psychol., vol. 118, no. 4, pp. 661–682, 2020, https://doi.org/10.1037/pspi0000193.

A. Irvine, P. Drew, and R. Sainsbury, "Am I not answering your questions properly? Clarification, adequacy and responsiveness in semi-structured telephone and face-to-face interviews," Qual. Res., vol. 13, pp. 87–106, 2013, https://doi.org/10.1177/1468794112439086.

S. C. Huang, A. Pareek, R. Zamanian et al., "Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection," Sci. Rep., vol. 10, p. 22147, 2020, https://doi.org/10.1038/s41598-020-78888-w.

Downloads

Published

2025-10-02

How to Cite

AN, J., Vinitha Eric, P., & Smitha Rao, M. (2025). The Sound and Sight of Confidence: An Audiovisual ML Approach. International Journal of Computing, 24(3), 545-551. Retrieved from https://computingonline.net/computing/article/view/4191

Issue

Section

Articles