Constructing a neural network emotion classifier for multimodal data

Authors

  • Kosachev Ilya Segreevich Ufa University of Science and Technology
  • Smetanina Olga Nikolaevna Ufa University of Science and Technology
  • Sasonova Yekaterina Yuryevna Ufa University of Science and Technology

Keywords:

machine learning, classification of emotions, multimodal data, neural networks

Abstract

This work is devoted to the development of a model for classifying human emotion by multimodal characteristics. The article reviews existing works solving the problem of classification of emotions by voice and speech; describes the setting of the task of classification of emotions, data preparation and solution methodology; presents the results of experiments with different models to solve the problem. Dusha dataset consisting of audio recordings in Russian language was used for the training. The result of the experiments was a model combining Wav2Vec2 and DistilBERT-small, which reached on the test set f1-macro value of 0,84 on the sub-sample crowd and 0,62 on the podcast. doi 10.54708/19926502_2025_29411039

Author Biographies

Kosachev Ilya Segreevich, Ufa University of Science and Technology

second year graduate at the Ufa University of Science and Technology, 32 Zaki Validi st., Ufa, Republic of Bashkortostan, 450076, Russia, ORCID ID: 0009-0005-2812-6777, Scopus Author ID: 58617515300

Smetanina Olga Nikolaevna, Ufa University of Science and Technology

Doctor of Technical Sciences, Associate Professor, Professor of the CMaC Department at the Ufa University of Science and Technology

Sasonova Yekaterina Yuryevna, Ufa University of Science and Technology

Candidate of Technical Sciences, Associate Professor, Associate Professor of the CMaC Department at the Ufa University of Science and Technology

Published

2025-25-12

Issue

Section

******************************