Pedro Ortiz Suarez
Pedro Ortiz Suarez
Home
Veröffentlichungen
Vorträge
Projekte
Kontakt
CV
Hell
Dunkel
Automatisch
Deutsch
Deutsch
English
Español
Français
1
Les modèles de langue contextuels Camembert pour le Français : impact de la taille et de l'hétérogénéité des données d'entrainement
We explore the impact of the training data size and heterogeneity on French language modeling. (Equal contribution by the first three authors).
Louis Martin
,
Benjamin Muller
,
Pedro Ortiz Suarez
,
Yoann Dupont
,
Laurent Romary
,
Éric de la Clergerie
,
Benoît Sagot
,
Djamé Seddah
PDF
Zitieren
Datensatz
Projekt
TALN 2020
HAL
Website
Establishing a New State-of-the-Art for French Named Entity Recognition
We explore convert the NER annotations of the French TreeBank to a more user-friendly format and establish a new state of the art for French NER.
Pedro Ortiz Suarez
,
Yoann Dupont
,
Benjamin Muller
,
Laurent Romary
,
Benoît Sagot
PDF
Zitieren
LREC 2020
HAL
arXiv
ACL Anthology
French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus
We investigate the impact of different types and size of training corpora on language models.
Murielle Popa-Fabre
,
Pedro Ortiz Suarez
,
Benoît Sagot
,
Éric de la Clergerie
PDF
Zitieren
CMLC-8
ACL Anthology
HAL
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures
We explore the impact of the OCR quality on grobid-dictionaries models.
Mohamed Khemakhem
,
Ioana Galleron
,
Geoffrey Williams
,
Laurent Romary
,
Pedro Ortiz Suarez
PDF
Zitieren
Projekt
TEI 2019
HAL
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures
We propose a new pipeline to filter, clean and classify Common Crawl by language, we publish the final corpus under the name OSCAR.
Pedro Ortiz Suarez
,
Benoît Sagot
,
Laurent Romary
PDF
Zitieren
Code
Datensatz
Projekt
Folien
DOI
CMLC-7
Website
HAL
«
Zitieren
×