Deception detection represents a critical challenge in domains such as security, forensics, and social computing. Traditional lie detection methods based on physiological signals (e.g., polygraph) suffer from invasiveness, low generalizability, and limited scientific credibility. Recent advances in Artificial Intelligence (AI) have fostered the development of data-driven approaches leveraging multimodal analysis of human behavior. In this work, we propose a novel audio-visual framework for automatic deception detection, integrating temporal representations extracted independently from speech and facial features. Our architecture adopts modality-specific deep models to capture discriminative temporal patterns from acustic features, facial Action Units, and emotional dynamics. The fusion of audio and visual predictions is performed through a late fusion strategy, with a meta-learning module to enhance performance. Extensive experiments on a real-world courtroom dataset demonstrate the effectiveness of our approach, achieving state-of-the-art results and surpassing competitive unimodal and multimodal baselines. This study provides new insights into the design of a non-invasive AI systems for deception detection in unconstrained environments
Truth or Lie: An Audio-Visual Approach to Deception Detection / Galli, A.; Gravina, M.; Pascarella, A. E.; Di Serio, F.; Cipollaro, D.; Moscato, V.; Sansone, C.. - 16168:(2025), pp. 189-199. ( 23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-10192-1_16].
Truth or Lie: An Audio-Visual Approach to Deception Detection
Galli A.;Gravina M.;Pascarella A. E.;Di Serio F.;Cipollaro D.;Moscato V.;Sansone C.
2025
Abstract
Deception detection represents a critical challenge in domains such as security, forensics, and social computing. Traditional lie detection methods based on physiological signals (e.g., polygraph) suffer from invasiveness, low generalizability, and limited scientific credibility. Recent advances in Artificial Intelligence (AI) have fostered the development of data-driven approaches leveraging multimodal analysis of human behavior. In this work, we propose a novel audio-visual framework for automatic deception detection, integrating temporal representations extracted independently from speech and facial features. Our architecture adopts modality-specific deep models to capture discriminative temporal patterns from acustic features, facial Action Units, and emotional dynamics. The fusion of audio and visual predictions is performed through a late fusion strategy, with a meta-learning module to enhance performance. Extensive experiments on a real-world courtroom dataset demonstrate the effectiveness of our approach, achieving state-of-the-art results and surpassing competitive unimodal and multimodal baselines. This study provides new insights into the design of a non-invasive AI systems for deception detection in unconstrained environmentsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


