Resources and tools

Find below some resources and tools developed as part of the research projects I have been involved in.

MDLText

MDLText

The MDLText is an efficient, lightweight, scalable, and fast multinomial text classifier. It exhibits fast incremental learning and is sufficiently robust to prevent overfitting, desirable features in real-world applications, large-scale problems, and online scenarios.

VISIT WEBSITE
Fake.Br Corpus

Fake.Br Corpus

This is the first known corpus for fake news detection in Portuguese. It contains aligned true and false news, meaning each false news item has a related true news counterpart.

VISIT WEBSITE
ML-MDLText

ML-MDLText

The ML-MDLText is an efficient and lightweight multilabel text classifier with incremental learning. It is based on the minimum description length principle and can be applied to multilabel classification without requiring the transformation of the classification problem. It takes advantage of dependency information among labels and naturally supports online learning. The results were very competitive with existing state-of-the-art online learning methods and those that transform multilabel problems into several single-label ones.

VISIT WEBSITE
GMDL

GMDL

The GMDL is a lightweight, multiclass, and online classifier. Despite its probabilistic nature, it can handle continuous features. Experiments conducted on real-world datasets with different characteristics demonstrated that it outperformed established online classification methods and is robust to overfitting, a desired characteristic for large, dynamic, and real-world classification problems.

VISIT WEBSITE
Question Classifier

Question Domain Classifier

This approach was built to classify questions into domains. A corpus of English news and Wikipedia articles was collected and preprocessed, leaving only textual content. A subset of categories from IPTC was chosen, and the documents that belong to those categories were used to generate automatic questions. The questions were used to train a Multinomial Naïve Bayes model, and it was evaluated using human-generated questions that belong to the same categories as the original documents.

VISIT WEBSITE
Rock Grains Dataset

Rock Grains Dataset

This dataset contains images of fine and coarse gravel grains.

VISIT WEBSITE