Discovering ongoing Conversations (DOC)

BottomBar

The task

Finding threads in textual dialogs is emerging as a need to better organize stored knowledge. This need is captured by this novel task of discovering ongoing conversations in scattered dialog blocks.

Current services organize dialogs in dialog blocks. These blocks are sequences of turns with a definite set of participants and with a precise temporal span (for messaging services) or with a single subject (for emails). However, dialog blocks conceal the global picture of ongoing conversations, which are usually more important and span over multiple dialog blocks, found on a variety of messaging platforms.

The task of discovering ongoing conversations in scattered dialog blocks aims to show coherent, ongoing conversations, instead of unrelated dialog blocks, by discovering whether two dialog blocks are subsequent in a dialog.

The corpus

To solve the insurmountable problem of Privacy of Big Personal Data, the DOC corpus is derived from theatrical plays. In fact, nobody wants to see her/his private dialogs in a shared testbed.

Download the corpus: DOC v1.0

Reference

If you use this corpus, please cite this paper (available for download):

Startup kit for using the corpus in Python for Deep Learning architectures

Evaluator and Corpus Rearder in Python and a simple Neural Network for Keras

Published results

Recall at k (R@k) of different models for discovering ongoing conversations.


small (block size = 3) medium (block size = 5)
Paper Description k=1 k=2 k=5 k=10 k=1 k=2 k=5 k=10
Zanzotto&Ferrone, 2017 PassiveAggressive with Group Features 0.137 0.215 0.353 0.468 0.259 0.370 0.491 0.587
Zanzotto&Ferrone, 2017 PassiveAggressive with Group + Stylistic + Distributed Syntactic features 0.191† 0.247† 0.370† 0.470 0.264 0.377 0.511 0.591
Zanzotto&Ferrone, 2017 PassiveAggressive with Group + Stylistic + Distributional Semantic features 0.182† 0.236 0.373† 0.462 0.284 0.369 0.487 0.578

Contact information

For comments and questions: Fabio Massimo Zanzotto