Pedro Ortiz Suarez
Pedro Ortiz Suarez
Home
Veröffentlichungen
Vorträge
Projekte
Kontakt
CV
Hell
Dunkel
Automatisch
Deutsch
Deutsch
English
Español
Français
2
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
We audit 5 multilingual corpora, finding that lower-resource corpora have systematic issues.
Julia Kreutzer
,
Isaac Caswell
,
Lisa Wang
,
Ahsan Wahab
,
Daan van Esch
,
Nasanbayar Ulzii-Orshikh
,
Allahsera Tapo
,
Nishant Subramani
,
Artem Sokolov
,
Claytone Sikasote
,
Monang Setyawan
,
Supheakmungkol Sarin
,
Sokhar Samb
,
Benoît Sagot
,
Clara Rivera
,
Annette Rios
,
Isabel Papadimitriou
,
Salomey Osei
,
Pedro Ortiz Suarez
,
Iroro Orife
,
Kelechi Ogueji
,
Andre Niyongabo Rubungo
,
Toan Q. Nguyen
,
Mathias Müller
,
André Müller
,
Shamsuddeen Hassan Muhammad
,
Nanda Muhammad
,
Ayanda Mnyakeni
,
Jamshidbek Mirzakhalov
,
Tapiwanashe Matangira
,
Colin Leong
,
Nze Lawson
,
Sneha Kudugunta
,
Yacine Jernite
,
Mathias Jenny
,
Orhan Firat
,
Bonaventure F. P. Dossou
,
Sakhile Dlamini
,
Nisansa de Silva
,
Sakine Çabuk Ballı
,
Stella Biderman
,
Alessia Battisti
,
Ahmed Baruwa
,
Ankur Bapna
,
Pallavi Baljekar
,
Israel Abebe Azime
,
Ayodele Awokoya
,
Duygu Ataman
,
Orevaoghene Ahia
,
Oghenefego Ahia
,
Sweta Agrawal
,
Mofetoluwa Adeyemi
PDF
Zitieren
Projekt
DOI
TACL
HAL
arXiv
Zitieren
×