Evaluating the effectiveness of fine-tuning in financial NLP: The case of Social Trading Action Detection

D'Amico, S.; Maurino, A.; Osborne, F.; Sperli', G.

doi:10.1016/j.ipm.2026.104910

Financial Natural Language Processing crucially leverages social media for market insights. However, most existing methods for this purpose rely on simple sentiment analysis models, which fail to capture the concrete trading intentions expressed in these discussions. While Large Language Models (LLMs) offer a promising alternative to simplistic sentiment analysis, the actual benefits of fine-tuning across different model families remain unclear in noisy, domain-specific contexts like online forums. To address this gap, we present a comprehensive assessment of the advantages and limitations of fine-tuning for Social Trading Action Detection (STAD), a novel task that aims to classify online posts into actionable categories, namely buy, sell, or other. In addition, we introduce FinReddit-2K, a manually annotated dataset consisting of 2123 Reddit posts, designed to serve as a benchmark for this task. Our experimental analysis goes beyond standard performance metrics and identifies both the types of errors that fine-tuning can successfully mitigate and those that it may inadvertently introduce. Through a systematic evaluation of 57 models, comparing 14 traditional models with 23 zero-shot LLMs and 20 fine-tuned variants, our results show that fine-tuning yields an average F1-score improvement of ＋15.1%. The best-performing model, a fine-tuned Mistral-7B, achieves an F1-score of 86.0%, although our analysis reveals that fine-tuning fails to produce meaningful performance gains in several scenarios.

Evaluating the effectiveness of fine-tuning in financial NLP: The case of Social Trading Action Detection / D'Amico, S., Maurino, A., Osborne, F., Sperli', G.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 63:8(2026). [10.1016/j.ipm.2026.104910]