Dominik Schlechtweg
I did my PhD at the IMS (University of Stuttgart) working together with Sabine Schulte im Walde on automatic detection of lexical semantic change. I held a PhD scholarship from Konrad Adenauer Foundation. After my PhD I did a research internship with Katrin Erk at the University of Texas, Austin. Since February 2022 I am a post-doctoral researcher at the IMS (University of Stuttgart) working in the 6-year research program Change is Key!.
Interests
- optimization of human text data annotation processes
- automation of lexicographic processes
- application of lexical semantic computational models
- statistical modeling of word meaning
Projects
- Word Usage Graphs represent usages of a word as nodes in a graph which are connected by weighted edges representing (human-annotated) semantic proximity.
[blog] [link]
- DURel Annotation Tool to obtain word usage graphs from human annotations
[slides] [poster] [blog] [link]
- LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
[pdf] [link]
- SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
[pdf] [slides]
- Workshop on Empirical Studies of Word Sense Divergences across Language Varieties
[link]
Talks (selected)
- DURel Annotation Tool - Prospects on a Workbench for Lexicographers.
Talk at Kick-Off Event of the RJ Research Program “Change is Key!”, Gothenburg, September 8th, 2022.
[slides] [link]
- Human and Computational Measurement of Lexical Semantic Change.
Keynote Talk at 3rd Workshop on Computational Detection of Language Change 2022 @ ACL, Dublin, May 26th, 2022.
[slides] [link]
- Human and Computational Measurement of Lexical Semantic Change.
PhD Defense Talk at IMS, University of Stuttgart, March 24th, 2022.
[slides]
- DURel Annotation Tool.
Talk at StuTS 69 + TaCoS 2021 (Online), May 9th, 2021.
[slides] [poster] [blog] [link]
- State-of-the-art models in lexical semantic change detection.
Invited Talk at SFB-TRR 161 (University of Konstanz), January 18th, 2021.
[slides] [link]
- Sparse Usage Graphs as Model for Word Meaning in Context.
Keynote Talk at 2nd Workshop on Computational Detection of Language Change 2020, University of Gothenburg, November 25th, 2020.
[slides] [link]
- Efficient Manual Word Sense Clustering on Historical Corpora.
Invited Talk at The Alan Turing Institute (London), November 11th, 2019.
- Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling.
Invited Talk at CIS, LMU Munich, July 24th, 2019.
[slides] [link]
- A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains.
Invited Talk at University of Helsinki, February 10th, 2019.
[slides] [link]
- Comparing Annotation Frameworks for Lexical Semantic Change.
Talk at 1st Workshop on Computational Detection of Language Change 2018, University of Gothenburg, November 7th, 2018.
[slides]
- Problems of DURel annotation measures for semantic change.
Talk at SemRel research group at IMS, University of Stuttgart, February 1st, 2018.
[slides]
Publications
- LSCDiscovery: A shared task on semantic change discovery and detection in Spanish. 2022.
Frank D. Zamora-Reina, Felipe Bravo-Marquez, Dominik Schlechtweg
Proceedings of the 3rd International Workshop on Computational Approaches to Historical Language Change
[pdf] [slides] [bib]
- DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish. 2022.
Gioia Baldissin, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the 13th Language Resources and Evaluation Conference
[pdf] [slides] [poster] [video] [bib]
- Modeling Sense Structure in Word Usage Graphs with the Weighted Stochastic Block Model. 2021.
Dominik Schlechtweg, Enrique Castaneda, Jonas Kuhn, Sabine Schulte im Walde
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, 241-251
[pdf] [slides] [poster] [video] [bib]
- Lexical Semantic Change Discovery. 2021.
Sinan Kurtyigit, Maike Park, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
[pdf] [slides] [video] [bib]
- More than just Frequency? Demasking Unsupervised Hypernymy Prediction Methods. 2021.
Thomas Bott, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings)
[pdf] [bib]
- Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch. 2021.
Diego Frassinelli, Gabriella Lapesa, Reem Alatrash, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 21-27
[pdf] [bib]
- DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. 2021.
Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7079-7091
[pdf] [slides] [poster] [video] [blog] [bib]
- Explaining and Improving BERT Performance on Lexical Semantic Change Detection. 2021.
Severin Laicher, Sinan Kurtyigit, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 192-202
[pdf] [poster] [bib]
- Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection. 2021.
Jens Kaiser, Sinan Kurtyigit, Serge Kotchourko, Dominik Schlechtweg
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 125-137
[pdf] [poster] [video] [bib]
- Challenges for Computational Lexical Semantic Change. 2021.
Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg, Haim Dubossarsky
Computational Approaches to Semantic Change
[pdf] [bib]
- CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection. 2020.
Severin Laicher, Gioia Baldissin, Enrique Castaneda, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020)
[pdf] [bib]
- OP-IMS @ DIACR-Ita: Back to the Roots: SGNS+OP+CD still rocks Semantic Change Detection. 2020.
Jens Kaiser, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020)
[pdf] [slides] [video] [bib] Winning Submission!
- SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. 2020.
Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky, Nina Tahmasebi
Proceedings of the 14th International Workshop on Semantic Evaluation
[pdf] [slides] [video] [bib]
- IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection. 2020.
Jens Kaiser, Dominik Schlechtweg, Sean Papay, Sabine Schulte im Walde
Proceedings of the 14th International Workshop on Semantic Evaluation
[pdf] [bib]
- CCOHA: Clean Corpus of Historical American English. 2020.
Reem Alatrash, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde
Proceedings of the 12th Language Resources and Evaluation Conference, 6958-6966
[pdf] [bib]
- Shared Task: Lexical Semantic Change Detection in German. 2020.
Adnan Ahmad, Kiflom Desta, Fabian Lang, Dominik Schlechtweg
CoRR
[pdf] [bib]
- Predicting Degrees of Technicality in Automatic Terminology Extraction. 2020.
Anna Hätty, Dominik Schlechtweg, Michael Dorna, Sabine Schulte im Walde
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
[pdf] [video] [bib]
- Simulating Lexical Semantic Change from Sense-Annotated Data. 2020.
Dominik Schlechtweg, Sabine Schulte im Walde
The Evolution of Language: Proceedings of the 13th International Conference (EvoLang13)
[pdf] [bib]
- Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. 2019.
Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 457-470
[pdf] [poster] [bib]
- Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling. 2019.
Dominik Schlechtweg, Cennet Oguz, Sabine Schulte im Walde
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 24-30
[pdf] [poster] [bib]
- A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains. 2019.
Dominik Schlechtweg, Anna Hätty, Marco del Tredici, Sabine Schulte im Walde
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 732-746
[pdf] [slides] [poster] [bib]
- SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction. 2019.
Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde
Proceedings of the 8th Joint Conference on Lexical and Computational Semantics, 1-8
[pdf] [bib]
- Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. 2018.
Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 169-174
[pdf] [slides] [poster] [bib]
- Distribution-based prediction of the degree of grammaticalization for German prepositions. 2018.
Dominik Schlechtweg, Sabine Schulte im Walde
The Evolution of Language: Proceedings of the 12th International Conference (EVOLANGXII)
[pdf] [bib]
- Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection. 2017.
Vered Shwartz, Enrico Santus, Dominik Schlechtweg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 65-75
[pdf] [bib]
- German in Flux: Detecting Metaphoric Change via Word Entropy. 2017.
Dominik Schlechtweg, Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde, Daniel Hole
Proceedings of the 21st Conference on Computational Natural Language Learning, 354-367
[pdf] [bib]
- Exploitation of Co-reference in Distributional Semantics. 2016.
Dominik Schlechtweg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
[pdf] [bib]
Supervision
- Reem Alatrash
Computational Analysis of Syntactic and Semantic Variation in Kiezdeutsch (Master thesis).
[slides]
- Tejaswi Choppa
Organization of human annotation processes (Student researcher).
- Gioia Baldissin
Unsupervised detection of diatopic lexical semantic variation in Spanish (Master thesis).
[slides]
- Pedro G. Bascoy
DURel Annotation Tool (Student researcher).
[link]
- Christian Bartsch
Predicting Synchronic and Diachronic Semantic Generality with Models of Hypernymy (Bachelor thesis).
- Thomas Bott
Unsupervised Models of Hypernymy for German Subordinate Noun Phrases (Bachelor thesis).
[slides]
- Andres Cabero
Computational annotators of semantic proximity (Student researcher).
- Enrique Castaneda
Efficient Online Word-Sense Clustering on Human Relatedness Judgments (Bachelor thesis).
- Nishan Chatterjee
DURel Annotation Tool (Student researcher).
[link]
- Vaibhav Jain
Historical Word Sense Induction (Internship).
- Jens Kaiser
Dimensionality and Noise in Models of Semantic Change Detection (Bachelor thesis).
[pdf]
- Serge Kotchourko
Optimizing Human Annotation of Word Usage Graphs in a Realistic Simulation Environment (Bachelor thesis).
[slides]
- Sinan Kurtyigit
Lexical Semantic Change Discovery (Bachelor thesis).
[pdf] [slides]
- Severin Laicher
Historical Word Sense Clustering with Deep Contextualised Word Embeddings (Bachelor thesis).
[slides]
- Frank David Zamora Reina
Lexical Semantic Change Detection in Spanish (PhD thesis).
- Benjamin Tunc
Optimierung von Clustering von Wortverwendungsgraphen (Bachelor thesis).
[pdf] [slides]
- Lukas Theuer Linke
Visualization of Word Usage Graphs (Bachelor thesis).
- Nash Whaley
Standardization of semantic annotation processes (Student researcher).
- Tuo Zhang
Automating computational annotation of semantic proximity (Student researcher).
Open thesis topics (please approach me if interested)
- Detection of ambiguous word usages
Many word usages are ambiguous. Often such usages are are regarded as noise and removed from datasets. However, in lexicographic analysis such usages represent rather interesting cases, as they may be instances of a new sense. The student shall review relevant literature, define an ambiguity detection task and use recent word sense annotation datasets and evaluate models on these tasks. The best models shall be implemented as annotators for an online system.
Primary references: P1, P2, P3
Secondary references: S1, S2, S3, S4
- Detection of semantic variation and sense number in word usage samples
Detecting the semantic variation in word usage samples is not a usual NLP task. However, semantic variation is a useful input to lexicographic analysis, where it can be used as a starting point to discover words with new senses. The student shall proceed by reviewing the literature on semantic variation, defining various tasks based on different measures of semantic variation (e.g. number of senses or average semantic proximity), and adapting and evaluate Word-in-Context models on these tasks [P3]. The best models shall be implemented as annotators for an online system.
Primary references: P1, P2, P3
Secondary references: S1
- Detection of non-recorded word senses
Dictionaries cover the senses of words at a particular point of time. When a word gains a new sense in a speaker community, its dictionary entry may become outdated. Lexicographers regularly check dictionaries for such outdated entries. The aim of the thesis will be to discover outdated dictionary entries in a modern German or Swedish dictionary by comparing target word usages from modern reference corpora to dictionary entries for the target word. The basic task to solve will be to decide whether the sense of a given word usage is covered by any of the dictionary entries or not. The main difficulty will be that a dictionary entry provides only limited information, which is not sufficient to train a standard Word Sense Disambiguation (WSD) system (sometimes only one example sentence besides the sense definition). Possible solutions shall be explored and tested, including zero-shot WSD systems [P1], unsupervised Word Sense Induction systems [P4] and unsupervised Lexical Semantic Change Detection systems [P5].
Primary references: P1, P2, P3, P4, P5
Secondary references: S1
- Detection of sense-representative uses in word usage samples
Word senses can be seen as sets of word uses with similar meanings. For the purpose of human annotation of these word senses or their description in dictionaries it can be helpful to pick representative uses from each word sense cluster, and even more helpful to do this automatically. Such a representative use should fulfill requirements such as clarity, non-ambiguity and agreement between annotators. I am not aware of any systematic approaches to automatically detect sense-representative word uses. The process of the thesis shall be to review literature, define the concept of sense-representativeness, create a small data set, define a task on the data set, define models for the task based on WSI and to evaluate these models on the task. The starting point can be the existing RefWUG data set using sense-representative uses, and data sets for graded word sense annotation, such as DWUG LA and WSsim.
- Detection of annotation bots for semantic annotation systems
Online annotation services have the problem that automatic annotation bots imitate humans to create a large number of “fake” annotations generating large revenues for the attacker. The aim of the thesis will be to detect such annotation bots in the DURel annotation tool by analysing output of human and computational annotators. The outcome shall be an automatic detection mechanism in the DURel tool.
- Implementation of computational annotators for lexicographic tasks
A number of lexical semantic NLP tasks are relevant for lexicography, amongst them the Word-in-Context task asking to predict whether two uses of a word have the same meaning and the Lexical Semantic Change Detection task asking to predict whether two time-specific samples of uses have the same sense distribution. The aim of the thesis is to implement and evaluate simple and robust computational models for a range of lexicography-related semantic tasks (see open topics below), to integrate these into an online interface for lexicographic analysis and to evaluate the usefulness of the models for everyday lexicography.
- Open topics on computational lexical semantics and lexicography
Any topic of your choice with regard the below-mentioned lexical semantic tasks:
Use-level
- Word Sense Induction (WSI), based on WiC annotation
- Word Sense Disambiguation (WSD)
- Lexical Replacement Annotation
- Abstractness Level Annotation
- Sentiment Level Annotation
- Ambiguity Annotation
Use-pair-level
- Semantic Proximity Annotation (aka Word-in-Context, WiC)
- Semantic Relation Annotation (SRA)
Lemma-level:
- Lexical Semantic Change Detection (LSCD), based on WSI or WSD
- Number of Senses Detection, based on WSI or WSD
- Change Type Detection, based on WSI or WSD + SRA
Sense-level:
- Detection of Sense-Representative Uses, based on WSI or WSD
- Generation of Sense Descriptions, based on WSI or WSD
- Prediction explanation for Lexical Semantic Change Detection
The LSCDBenchmark implements various automatic models of Lexical Semantic Change Detection to predict binary or graded change labels for target words from their uses from different time periods. However, historical linguists are not only interested in the question whether a word changed, but also how it changed [1]. The aim of the thesis is to derive descriptions of the senses which were lost/gained or changed in frequency from model predictions, to detect sense/change representative uses and information on the semantic relation of old and new senses.
- Annotation disagreement detection in Word Usage Graphs
Traditionally, annotator disagreements have been regarded as noise and were removed from data sets, but recent approaches try to treat disagreements as signal, using them for model training. Word sense annotation is an annotation task yielding comparably low inter-annotator agreement. The aim of the thesis is to analyze disagreements in recent word sense annotation datasets representing uses of a word as nodes in a graph which are connected by weighted edges representing (human-annotated) semantic proximity. These can be clustered to infer word senses on the graph. The thesis shall examine how annotator disagreements can be detected an how they can be exploited to infer word senses on the annotated graphs.
- Adjustment of sense granularity for clustering on Word Usage Graphs
Word Usage Graphs represent uses of a word as nodes in a graph which are connected by weighted edges representing (human-annotated) semantic proximity. These can be clustered to infer word senses on the graph. Adjusting parameters on existing clustering approaches should allow to infer word senses of varying granularity. The process of the thesis shall be to review literature on sense granularity and clustering, create a small data set of word sense definitions with different granularities, and to evaluate clustering solutions obtained on the graphs against the data set.
- Estimation of Jensen Shannon Divergence for skewed probability distributions
The Jensen Shannon Divergence (JSD) between word sense distributions is an important measure of semantic change [1]. Interestingly, it can be approximated from pairwise usage distances avoiding the need to cluster usages [2]. The aim of the thesis is to formulate various approximations of the JSD and to estimate their bias empirically.