TY - GEN
T1 - QiBERT - Classifying Online Conversations
T2 - 14th IFIP WG 5.5/SOCOLNET Advanced Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS 2023
AU - Saraiva, Bruno David Ferreira
AU - Marques-Pita, Manuel
AU - Matos-Carvalho, João Pedro
AU - FILHO, ZUIL ANTÓNIO PIROLA
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Recent developments in online communication and their usage in everyday life have caused an explosion in the amount of a new genre of text data, short text. Thus, the need to classify this type of text based on its content has a significant implication in many areas. Online debates are no exception, once these provide access to information about opinions, positions and preferences of its users. This paper aims to use data obtained from online social conversations in Portuguese schools (short text) to observe behavioural trends and to see if students remain engaged in the discussion when stimulated. This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject. Using SBERT embeddings as a feature, with supervised learning, the proposed model achieved results above 0.95 average accuracy for classifying online messages. Such improvements can help social scientists better understand human communication, behaviour, discussion and persuasion.
AB - Recent developments in online communication and their usage in everyday life have caused an explosion in the amount of a new genre of text data, short text. Thus, the need to classify this type of text based on its content has a significant implication in many areas. Online debates are no exception, once these provide access to information about opinions, positions and preferences of its users. This paper aims to use data obtained from online social conversations in Portuguese schools (short text) to observe behavioural trends and to see if students remain engaged in the discussion when stimulated. This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject. Using SBERT embeddings as a feature, with supervised learning, the proposed model achieved results above 0.95 average accuracy for classifying online messages. Such improvements can help social scientists better understand human communication, behaviour, discussion and persuasion.
KW - Natural Language Processing (NLP)
KW - Online Conversation
KW - Sentence Embeddings
KW - Short Text
KW - Supervised Learning
KW - Text Classification
UR - http://www.scopus.com/inward/record.url?scp=85164933504&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-36007-7_16
DO - 10.1007/978-3-031-36007-7_16
M3 - Conference contribution
AN - SCOPUS:85164933504
SN - 9783031360060
T3 - IFIP Advances in Information and Communication Technology
SP - 216
EP - 229
BT - Technological Innovation for Connected Cyber Physical Spaces - 14th IFIP WG 5.5/SOCOLNET Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS 2023, Proceedings
A2 - Camarinha-Matos, Luis M.
A2 - Ferrada, Filipa
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 5 July 2023 through 7 July 2023
ER -