Constructing a neural network emotion classifier for multimodal data
Keywords:
machine learning, classification of emotions, multimodal data, neural networksAbstract
This work is devoted to the development of a model for classifying human emotion by multimodal characteristics. The article reviews existing works solving the problem of classification of emotions by voice and speech; describes the setting of the task of classification of emotions, data preparation and solution methodology; presents the results of experiments with different models to solve the problem. Dusha dataset consisting of audio recordings in Russian language was used for the training. The result of the experiments was a model combining Wav2Vec2 and DistilBERT-small, which reached on the test set f1-macro value of 0,84 on the sub-sample crowd and 0,62 on the podcast. doi 10.54708/19926502_2025_29411039Downloads
Published
2025-25-12
Issue
Section
******************************