UniWay dataset simulates the interaction between students and the university’s back-office staff. The data consists of questions made by graduate students (including Erasmus students) of the Athens University of Economics and Business (AUEB)1 and the Aristotle University of Thessaloniki (AUTH)2 collected with a fully anonymous and GDPR compliant process. Students were asked to provide two versions of their questions in two English and Greek, whenever it was possible.
Based on the information provided by the staff working at the back offices of the two universities, we distinguished six different types of questions, which correspond to six different student intents as shown in Table 1. The same table contains the distribution of questions per intent in each of the two UniWay datasets (i.e. EN and GR). The English version of the dataset is slightly larger because some of the students who participated did not speak Greek.
Question Type | Intent | Greek | English |
---|---|---|---|
What is a course’s weight in the final grade? | I1 AvailableCourses | 365 | 523 |
What are the available courses for the following semester? | I2 CourseCredit | 361 | 380 |
What courses I have passed in this exam period? | I3 PassedCourses | 362 | 360 |
Get my grade in a course | I4 GradeCourse | 362 | 366 |
Get the contact info of a tutor. | I5 TutorInfo | 364 | 362 |
Get the tutor of a course | I6 TutorCourse | 362 | 346 |
Total | 2176 | 2337 |
Table 1: The distribution of intent labels in the Greek and English versions of the UniWay dataset
The intent of each question has been manually defined by a team of experts. In order get an indication of the difficulty of the IE task, we asked two annotators to examine the same subset of English and Greek queries (120 queries each) and make a decision on the intent of each query. The inter-annotator agreement on this subset was 90% for the English dataset and 85% for the Greek one. The disagreements were mainly due to the fact that some sentences were not very clear, were missing key entities or terms that would help to decide on their intent, and thus could be interpreted differently. These ambiguous sentences mostly feel under the intents I1 (AvailableCourses) and I3 (PassedCourses). Nevertheless, a successful conversational agent should be able to simulate non-clear sentences in the most realistic way, so we decided not to have an UnknownIntent class in our dataset and such cases have been resolved by asking reviewers to reach a consensus.
The questions were then converted to lower case, cleared from punctuation and accents (the Greek utterances) and annotated for entities. Each word token which describes a named entity is matched with an entity from a predefined list, using IOB tags (e.g., begin, inside) and labels (e.g., tutor name), otherwise is matched with O (the “null” tag), as shown in the examples of Figure 1.
Figure 1: Sample questions with entities and intents from the UniWay EN dataset
The distribution of the 6 entities in each dataset is shown in Table 2. We can observe that intents are almost equally distributed in contrast to entities. Entity teacher name i appears very few times due to the fact that teachers are rarely searched by their first name.
Entities | Greek | English |
---|---|---|
exam period b | 409 | 479 |
exam period i | 488 | 532 |
teacher name b | 322 | 308 |
teacher name i | 21 | 36 |
course name b | 1032 | 969 |
course name i | 2030 | 1757 |
Table 2: The distribution of entity type labels in the Greek and English versions of the UniWay dataset
1https://www.dept.aueb.gr/cs
2https://www.csd.auth.gr/en
Citations
If you use our dataset in your research or find our repository useful, please cite our work
S. Rizou, A. Theofilatos, A. Paflioti, E. Pissari, I. Varlamis, G. Sarigiannidis, K.Ch. Chatzisavvas,
Efficient intent classification and entity recognition for university administrative services employing deep learning models,
Intelligent Systems with Applications,
Volume 19,
2023,
200247,
ISSN 2667-3053,
https://doi.org/10.1016/j.iswa.2023.200247
License
Uniway Dataset is available under Creative Commons BY-NC-SA 4.0 license
Download
Download the files of UNIWAY EN and UNIWAY GR here