The task
Finding threads in textual dialogs is emerging as a need to better organize stored knowledge. This need is captured by this novel task of discovering ongoing conversations in scattered dialog blocks.
Current services organize dialogs in dialog blocks. These blocks are sequences of turns with a definite set of participants and with a precise temporal span (for messaging services) or with a single subject (for emails). However, dialog blocks conceal the global picture of ongoing conversations, which are usually more important and span over multiple dialog blocks, found on a variety of messaging platforms.
The task of discovering ongoing conversations in scattered dialog blocks aims to show coherent, ongoing conversations, instead of unrelated dialog blocks, by discovering whether two dialog blocks are subsequent in a dialog.
The corpus
To solve the insurmountable problem of Privacy of Big Personal Data, the DOC corpus is derived from theatrical plays. In fact, nobody wants to see her/his private dialogs in a shared testbed.
Download the corpus: DOC v1.0
Reference
If you use this corpus, please cite this paper (available for download):
ACM Transactions on Interactive Intelligent Systems (TiiS), 2017
Startup kit for using the corpus in Python for Deep Learning architectures
Evaluator and Corpus Rearder in Python and a simple Neural Network for KerasPublished results
Recall at k (R@k) of different models for discovering ongoing conversations.small (block size = 3) | medium (block size = 5) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Paper | Description | k=1 | k=2 | k=5 | k=10 | k=1 | k=2 | k=5 | k=10 |
Zanzotto&Ferrone, 2017 | PassiveAggressive with Group Features | 0.137 | 0.215 | 0.353 | 0.468 | 0.259 | 0.370 | 0.491 | 0.587 |
Zanzotto&Ferrone, 2017 | PassiveAggressive with Group + Stylistic + Distributed Syntactic features | 0.191† | 0.247† | 0.370† | 0.470 | 0.264 | 0.377 | 0.511 | 0.591 |
Zanzotto&Ferrone, 2017 | PassiveAggressive with Group + Stylistic + Distributional Semantic features | 0.182† | 0.236 | 0.373† | 0.462 | 0.284 | 0.369 | 0.487 | 0.578 |
Contact information
For comments and questions: Fabio Massimo Zanzotto