An Agent-Driven Architecture for Harmful Meme Detection through Multimodal Decomposition

Orlando, G. M.; Perillo, M.; Russo, D.; Moscato, V.

doi:10.1109/ISM66958.2025.00031

Harmful meme detection poses a critical challenge for online moderation, as the multimodal and context-dependent nature of memes undermines the effectiveness of traditional unimodal classifiers. In this work, we propose an agent-driven architecture that combines multimodal decomposition with multiagent reasoning. The system extracts complementary information from memes through text extraction, image captioning, and contextual visual description, which are integrated into a unified textual representation. This representation is then analyzed by a set of LLM-powered agents, each instantiated with a distinct interpretative persona, whose assessments are consolidated by a decision aggregation module. Experimental results on the Facebook Hateful Memes dataset demonstrate that the multiagent approach significantly improves performance: accuracy increases from 54.25% (single-agent baseline) to 67.28%, while the true positive rate rises from 40.32% to 72.78%. These findings highlight the effectiveness of integrating multimodal decomposition with agent-based perspectives for harmful meme detection, ensuring both robustness and explainability in the decision-making process.

An Agent-Driven Architecture for Harmful Meme Detection through Multimodal Decomposition / Orlando, G.M., Perillo, M., Russo, D., Moscato, V.. - (2025), pp. 91-97. (27th International Symposium on Multimedia, ISM 2025 ita 2025) [10.1109/ISM66958.2025.00031].