preprocessing

URI: https://gptkb.org/prop/preprocessing

10 triples

GPTKB property

Subject	Object
gptkb:KNN_(K-Nearest_Neighbors)	feature scaling
gptkb:CNN/Daily_Mail	anonymized entities
gptkb:C4_(Colossal_Clean_Crawled_Corpus)	cleaned of boilerplate and non-English text
gptkb:CNN/Daily_Mail	tokenization
gptkb:C4_(Colossal_Clean_Crawled_Corpus)	filtered for quality
gptkb:CNN/Daily_Mail	sentence splitting
gptkb:KNN_(K-Nearest_Neighbors)	normalization
gptkb:C4_(Colossal_Clean_Crawled_Corpus)	deduplicated
gptkb:CNN/DailyMail_dataset	anonymized entities
gptkb:CNN/DailyMail_dataset	non-anonymized version available