FAIR@CLIB2026: FAIR Language Resources in NLP: Stewardship, Reuse and Long-Term Sustainability Sofia, Bulgaria, September 7, 2026 |
| Submission link | https://easychair.org/conferences/?conf=fairclib2026 |
FAIR Language Resources in NLP: Stewardship, Reuse and Long-Term Sustainability
Pre-conference Workshop at CLIB 2026
Sofia, Bulgaria
7 September 2026
Why this workshop and why now?
Language resources are the foundation of linguistic research and NLP. Corpora, lexicons, annotated datasets, benchmarks, and models are produced at an unprecedented pace. Yet their long-term stewardship, interoperability, and reuse remain inconsistent and often fragile. Rapid creation has outpaced sustainable design.
This workshop aims to bring together researchers, infrastructure providers, data stewards, and policy actors who are committed to building durable language resource ecosystems. We aim to address the pressing challenges of sustaining datasets used in linguistic research and in the development of NLP systems—from documentation and versioning to governance, licensing, and infrastructure support.
The workshop will explore how FAIR principles (Findable, Accessible, Interoperable, Reusable) can be meaningfully operationalised for language resources in NLP and computational linguistics.
Important Dates
- Submission deadline: 22 April 2026
- Notification of acceptance: 22 May 2026
- Camera-ready deadline: To be confirmed
- Workshop date: 7 September 2026
Submission Guidelines
The proposal considers short papers (4-6 pages), which will be delivered in 10 min slots.
Submissions should be made using the template of CLIB using the Word template (camera-ready) available on https://dcl.bas.bg/clib/instructions-for-authors/.
List of Topics
1. Technical Foundations
Designing language resources so they are interoperable, transparent, and structurally reusable.
- Domain-specific FAIR implementation strategies for corpora, lexicons, datasets, and models
- Metadata, paradata, and annotation transparency frameworks
- Repository architectures and infrastructure design for linguistic data
2. Lifecycle & Reuse
Ensuring language resources remain usable, traceable, and measurable across research cycles.
- Documentation, versioning, and provenance tracking for evolving resources
- Persistent identifiers and citation mechanisms for language datasets
- Methods for tracking, measuring, and evidencing reuse
- Critical reflections and lessons learned from implementation challenges
- From raw data to FAIR-ready assets: preprocessing, cleaning, and quality assurance workflows
- Replicability of the experiments over the language resources
3. Policy & Sustainability
Creating the institutional and legal conditions that allow language resources to endure.
- Legal, ethical, and licensing considerations in sharing and reusing language data
- Governance structures and sustainability models beyond project funding
- Raising awareness for and supporting communities in adapting best practices
Committees
Program Committee (under development)
- Edward J. Pinot Gray, DARIAH Coordination Office Paris, France
- Egon W. Stemle, Institute for Applied Linguistics, Italy
- Olha Kanishcheva, Friedrich Schiller University Jena, Germany
- Petya Osenova, Faculty of Slavic Studies at Sofia University “St. Kliment Ohridski” and Department of Linguistic Modelling and Knowledge Processing at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences
- Ruslana Margova, GATE Institute, Sofia
Chairs
- Milena Dobreva (University of Strathclyde, IMI BAS)
- Ivan Lambov (IMI BAS)
Organizing committee
- Krassimira Ivanova
- Teodora Gandova
Invited Speakers
-
Kaja Dobrovoljc, Research Associate, Laboratory for Machine Learning and Language Technologies, Faculty of Computer and Information Science, University of Ljubljana, Slovenia
-
Mietta Lennes, RI Specialist, FIN-CLARIN & Kielipankki – The Language Bank of Finland, Department of Digital Humanities, University of Helsinki, Finland
-
Beth Knazook, Senior Programme Manager, Research and Engagement, Digital Repository of Ireland
Publication
We are finalising the proceedings information.
Venue
The conference will be held in Sofia, Bulgaria.
Contact
All questions about submissions should be emailed to milena.dobreva AT ustrath.ac.uk.
Acknowledgement of support
The workshop organisation is partially supported by the project BG16RFPR002-1.016-0002 “ERA Chair in fostering digital cultural heritage via open innovations and open science” funded by the Programme “Research, Innovation and Digitalization for Smart Transformation” 2021-2027 (PRIDST) and co-funded by the European Union.
