The MDLText is an efficient, lightweight, scalable, and fast multinomial text classifier. It exhibits fast incremental learning and is sufficiently robust to prevent overfitting, desirable features in real-world applications, large-scale problems, and online scenarios.
VISIT WEBSITEThis is the first known corpus for fake news detection in Portuguese. It contains aligned true and false news, meaning each false news item has a related true news counterpart.
VISIT WEBSITEThe ML-MDLText is an efficient and lightweight multilabel text classifier with incremental learning. It is based on the minimum description length principle and can be applied to multilabel classification without requiring the transformation of the classification problem. It takes advantage of dependency information among labels and naturally supports online learning. The results were very competitive with existing state-of-the-art online learning methods and those that transform multilabel problems into several single-label ones.
VISIT WEBSITEThe GMDL is a lightweight, multiclass, and online classifier. Despite its probabilistic nature, it can handle continuous features. Experiments conducted on real-world datasets with different characteristics demonstrated that it outperformed established online classification methods and is robust to overfitting, a desired characteristic for large, dynamic, and real-world classification problems.
VISIT WEBSITEThis approach was built to classify questions into domains. A corpus of English news and Wikipedia articles was collected and preprocessed, leaving only textual content. A subset of categories from IPTC was chosen, and the documents that belong to those categories were used to generate automatic questions. The questions were used to train a Multinomial Naïve Bayes model, and it was evaluated using human-generated questions that belong to the same categories as the original documents.
VISIT WEBSITE