Top

Airline Travel Information System corpus (ATIS) is a popular, benchmark dataset in NER research area1. The domain of the dataset is airline travel, and the intents include finding a flight, enquiring about airlines or their services, asking about ticket prices, etc. The ATIS pilot corpus is a corpus designed to measure progress in Spoken Language Systems that include both a speech and a natural language component. Due to the nature of the dataset, it has been used extensively to train, test and develop NER and IC networks and question answering systems, for both spoken and written language. As the dataset consists of audio recordings and their corresponding manual transcripts and every utterance is completely labelled, it is convenient for both types of approaches.

The English ATIS dataset is a modified version where 1290 duplicate utterances have been removed resulting in a clean version which consists of 5473 unique sentences. This processed version has been translated in Greek language. For the translation we harvested the textual data of ATIS and we translated it to Greek employing a panel of 5 persons in order to optimize the quality of the translation. All words were lowercased, and the accents removed. An example of ATIS’ sentence in both English and Greek languages is presented in Table 1 along with the appropriate In/Out/Begin (IOB) representation.

Intentatis_flightIntentatis_flight
Sentence (EN) Named entitySentence (GR)Named entity
ShowOδειξτεO
SundayB-τιςO
flightsdepart_date.day_nameπτησειςO
fromOτηςO
SeattleOκυριακηςB-
toB-fromloc.city_nameαποdepart_date.day_name
ChicagoOσιατλO
B-toloc.city_nameγιαB-fromloc.city_name
σικαγοO
B-toloc.city_name

1 P. Price, Evaluation of spoken language systems: The ATIS domain, in: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24–27, 1990, 1990.

Citations

If you use our dataset in your research or find our repository useful, please cite our work

S. Rizou, A. Paflioti, A. Theofilatos, A. Vakali, G. Sarigiannidis, K.Ch. Chatzisavvas,
Multilingual Name Entity Recognition and Intent Classification employing Deep Learning architectures,
Simulation Modelling Practice and Theory,
Volume 120,
2022,
102620,
ISSN 1569-190X,
https://doi.org/10.1016/j.simpat.2022.102620

License

ATIS GR Dataset is available under Creative Commons BY-NC-SA 4.0 license

Download

Download the csv files of ATIS EN and ATIS GR here