Harmful meme detection poses a critical challenge for online moderation, as the multimodal and context-dependent nature of memes undermines the effectiveness of traditional unimodal classifiers. In this work, we propose an agent-driven architecture that combines multimodal decomposition with multiagent reasoning. The system extracts complementary information from memes through text extraction, image captioning, and contextual visual description, which are integrated into a unified textual representation. This representation is then analyzed by a set of LLM-powered agents, each instantiated with a distinct interpretative persona, whose assessments are consolidated by a decision aggregation module. Experimental results on the Facebook Hateful Memes dataset demonstrate that the multiagent approach significantly improves performance: accuracy increases from 54.25% (single-agent baseline) to 67.28%, while the true positive rate rises from 40.32% to 72.78%. These findings highlight the effectiveness of integrating multimodal decomposition with agent-based perspectives for harmful meme detection, ensuring both robustness and explainability in the decision-making process.

An Agent-Driven Architecture for Harmful Meme Detection through Multimodal Decomposition / Orlando, G. M.; Perillo, M.; Russo, D.; Moscato, V.. - (2025), pp. 91-97. ( 27th International Symposium on Multimedia, ISM 2025 ita 2025) [10.1109/ISM66958.2025.00031].

An Agent-Driven Architecture for Harmful Meme Detection through Multimodal Decomposition

Orlando G. M.;Perillo M.;Russo D.;Moscato V.
2025

Abstract

Harmful meme detection poses a critical challenge for online moderation, as the multimodal and context-dependent nature of memes undermines the effectiveness of traditional unimodal classifiers. In this work, we propose an agent-driven architecture that combines multimodal decomposition with multiagent reasoning. The system extracts complementary information from memes through text extraction, image captioning, and contextual visual description, which are integrated into a unified textual representation. This representation is then analyzed by a set of LLM-powered agents, each instantiated with a distinct interpretative persona, whose assessments are consolidated by a decision aggregation module. Experimental results on the Facebook Hateful Memes dataset demonstrate that the multiagent approach significantly improves performance: accuracy increases from 54.25% (single-agent baseline) to 67.28%, while the true positive rate rises from 40.32% to 72.78%. These findings highlight the effectiveness of integrating multimodal decomposition with agent-based perspectives for harmful meme detection, ensuring both robustness and explainability in the decision-making process.
2025
An Agent-Driven Architecture for Harmful Meme Detection through Multimodal Decomposition / Orlando, G. M.; Perillo, M.; Russo, D.; Moscato, V.. - (2025), pp. 91-97. ( 27th International Symposium on Multimedia, ISM 2025 ita 2025) [10.1109/ISM66958.2025.00031].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1044928
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact