An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Roberta Rodrigues de Lima, Anita M.R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker, Valderi Reis Quietinho Leithardt

Resultado de pesquisarevisão de pares

7 Citações (Scopus)

Resumo

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.

Idioma originalInglês
Número do artigo8
RevistaBig Data and Cognitive Computing
Volume6
Número de emissão1
DOIs
Estado da publicaçãoPublicadas - mar. 2022

Nota bibliográfica

Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.

Financiamento

Financiadoras/-esNúmero do financiador
Foundation for Science and Tech-nology
Fundação para a Ciência e a TecnologiaCOFAC/ILIND/COPEL ABS/3/2020, UIDB/04111/2020, UIDB/05064/2020
Ministério da Ciência, Tecnologia e Ensino SuperiorUIDB/50008/2020

    Impressão digital

    Mergulhe nos tópicos de investigação de “An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade“. Em conjunto formam uma impressão digital única.

    Citar isto