Instructions to use ProMeText/aquilign-multilingual-segmenter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ProMeText/aquilign-multilingual-segmenter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ProMeText/aquilign-multilingual-segmenter")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ProMeText/aquilign-multilingual-segmenter") model = AutoModelForTokenClassification.from_pretrained("ProMeText/aquilign-multilingual-segmenter") - Notebooks
- Google Colab
- Kaggle
Aquilign Multilingual Segmenter
Aquilign Multilingual Segmenter is a token-classification model for phrase-level segmentation of medieval and historical texts.
The model is designed to detect custom segmentation delimiters in multilingual historical corpora and is used as part of the Aquilign alignment workflow.
Model Description
The segmenter is based on a trainable BertForTokenClassification model from Hugging Face’s transformers library.
It was fine-tuned on historical prose from the Multilingual Segmentation Dataset to identify phrase-level segmentation boundaries.
Supported Languages
- Latin
- French
- Castilian
- Portuguese
- Catalan
- English
- Italian
Intended Use
This model is intended for:
- phrase-level segmentation of medieval texts
- preprocessing parallel corpora before alignment
- multilingual medieval text alignment workflows
- digital philology and computational humanities research
It is especially designed to be used with Aquilign.
Related Resources
Citation
If you use this model, please cite the related dataset and publication.
Dataset
@dataset{ing2025multilingual,
author = {Ing, L. and Gille Levenson, M. and Macedo, C.},
title = {Multilingual Segmentation Dataset for Historical Prose (13th--16th c.)},
year = {2025},
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.16992629},
url = {https://doi.org/10.5281/zenodo.16992629},
license = {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International}
}
Related Publication
@inproceedings{ing-etal-2026-phrase,
title = {Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts},
author = {Ing, Lucence and Gille Levenson, Matthias and Macedo, Carolina},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {936--946},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
doi = {10.63317/32huzuuokpfr}
}
- Downloads last month
- 202