Classification of Kannada documents using novel semantic symbolic representation and selection method

International Journal of Artificial Intelligence

Classification of Kannada documents using novel semantic symbolic representation and selection method

Abstract

Kannada is one of the 22 scheduled Indian regional languages. It is also a low-resource regional language. The Kannada document classification is arduous due to its vocabulary richness, agglutinative terms, and lack of resources. The good representation and the prominent feature selection aid in solving the challenges in document classification tasks. In this paper, we are proposing semantic symbolic representation and feature selection method, for better representation of Kannada terms in interval values embedded with positional information. Following, selection of prominent discriminative symbolic feature vectors is also proposed. Further the symbolic document classifier is used to classify the Kannada documents. The proposed cluster based symbolic representation preserves the intra class variance and reduces the ambiguity in classification of Kannada documents. The experiments are performed over two Kannada document datasets which are multilabel and unbalanced. The comparative analysis of proposed method with other standard methods is also presented.

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration