A hybrid approach to stegoanalysis based on multimodal large language models and convolutional neural networks

DOI: 10.31673/2412-9070.2026.027616

Authors

  • Ю. В. Мішкур, (Mishkur Yu.) State University of Information and Communication Technologies, Kyiv
  • О. С. Захарченко, (Zakharchenko O.) National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

DOI:

https://doi.org/10.31673/2412-9070.2026.027616

Abstract

The article considers a hybrid approach to stegoanalysis of digital images, which combines the capabilities of specialized convolutional neural networks (CNNs) for low-level detection of statistical anomalies with the semantic analysis of multimodal large language models (MLLMs). The proposed approach is aimed at overcoming three fundamental limitations of existing monolithic CNN detectors: low generalization ability to unknown steganographic algorithms, lack of contextual analysis of multimodal metadata, and opacity of decision-making processes. The architecture of the hybrid system is implemented in the TensorFlow/Keras environment using three CNN architectures – MobileNetV2, ResNet50 and EfficientNetB0 - modified with specialized input filtering layers based on the Laplace kernel and the SRM filter bank to extract steganographically significant residual signals. Integration with language models is implemented through a local deployment of Ollama server in the Google Colab environment using the Gemma 3:4b, Gemma 3:12b and Llama 3.2 Vision 11B models. The final solution is formed through a “soft” fusion mechanism (Decision Fusion) of the weighted outputs of the CNN and MLLM components, where the weights are dynamically adjusted depending on the detected semantic context of the image.
Experimental verification is performed on a synthesized dataset based on CIFAR-10 (LSB embedding) and on the ALASKA2 reference set. The highest detection accuracy was provided by the configuration ResNet50 + Gemma 3:12b: 95.8% on CIFAR-10 and 91.7% on ALASKA2. The obtained results indicate the promise of the hybrid approach for increasing the accuracy, generalizability and interpretability of stegoanalysis systems.

Keywords: stegoanalysis, steganography, convolutional neural networks, large language models, MobileNetV2, ResNet50, EfficientNet, Gemma3, Llama, Ollama, hybrid architecture. 

Published

2026-04-26

Issue

Section

Articles