TFCSG: An Unsupervised Approach for Question-retrieval Over Multi-task Learning
- Most Question-answering (QA) systems rely on training data to reach their optimal performance. However, acquiring training data for supervised systems is both time-consuming and resource-intensive. To address this, in this paper, we propose TFCSG, an unsupervised similar question retrieval approach that leverages pre-trained language models and multi-task learning. Firstly, topic keywords in question sentences are extracted sequentially based on a latent topic-filtering algorithm to construct unsupervised training corpus data. Then, the multi-task learning method is used to build the question retrieval model. There are three tasks designed. The first is a short sentence contrastive learning task. The second is the question sentence and its corresponding topic sequence similarity judgment task. The third is using question sentences to generate their corresponding topic sequence task. The three tasks are used to train the language model in parallel. Finally, similar questions are obtained by calculating the cosine similarity between sentence vectors. The comparison experiment on public question datasets that TFCSG outperforms the comparative unsupervised baseline method. And there is no need for manual marking, which greatly saves human resources.
Author of HS Reutlingen | Rätsch, Matthias; Danner, Michael |
---|---|
DOI: | https://doi.org/10.23919/SICE59929.2023.10354081 |
Erschienen in: | 2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE) |
Publisher: | IEEE |
Place of publication: | Piscataway, NY |
Document Type: | Conference proceeding |
Language: | English |
Publication year: | 2023 |
Page Number: | 6 |
First Page: | 610 |
Last Page: | 615 |
DDC classes: | 004 Informatik |
Open access?: | Nein |
Licence (German): | In Copyright - Urheberrechtlich geschützt |