Sign in

 

Faculty of Information and Communication Technology

The plWordNet Dictionary Awarded by the Polish Academy of Sciences

Date: 05.01.2023 Category: General

The linguistic and IT team led by Prof. Maciej Piasecki from the Department of Artificial Intelligence received the distinction of the President of the Polish Academy of Sciences for creating the plWordNet dictionary, which contains over 190,000 entries, 285,000 meanings and over 600,000 relationships.

pan_-_dyplom_dla_slowosieci.jpegAccording to Prof. Piasecki, the coordinator of the CLARIN-PL research infrastructure, it is one of the largest dictionaries of the Polish language in history and one of the two largest wordnet dictionaries in the world – and it is still growing.

The plWordnet is a relational semantic dictionary that reflects the lexical system of the Polish language and was created by our scientists not only for IT specialists who deal with e.g. word processing, but also ordinary language users or foreigners who learn Polish. The plWordNet is an interactive dictionary that can be navigated through effectively not only by people, but also by computer programs. The dictionary can be download for free or browsed here.

The dictionary developers have described over 190,000 words (headwords) from the Polish language by showing their connections with other words. More than 600,000 different types of relationships between words have been distinguished. Over 80,000 lexical items are also provided with emotional markers – positive, negative, ambiguous or neutral, as well as the evoked emotions and the represented fundamental values.

piasecki.jpg– Dictionary entries are built in a very simple way, e.g. the word car is associated with synonyms such as auto or automobile. Different types of cars are also listed separately: bus, taxi, convertible, as are more general terms that include the concept of a car: a double-track vehicle or a means of transport. The words associated with the car include parts of the car: engine, windshield washer, chassis, as well as its synonymous words: ride and wheels – explains Prof. Maciej Piasecki. He adds that single meanings in the plWordNet are connected by mutual lexical and semantic relationships (a total of 57 types, 107 subtypes), and this is how a network is created in which each word is defined by reference to other words.

The plWordNet can also be used as a Polish-English and English-Polish dictionary, because it has been connected to the first and for years the largest wordnet in the world – the Princeton WordNet. It is also a very important resource in computer language processing and in research on artificial intelligence – at some point it was also used in among others in automatic Google Translate services. The plWordNet is developed owing to the work of linguists who are supported by IT tools developed for exploring very large databases of texts (over 4.5 billion words). Programs developed at Wrocław University of Science and Technology learn, among others, word meanings from a huge database of texts and propose meaning descriptions for approval by linguists.

– The plWordnetwork can be used for automatic translations, text or speech analysis, especially for semantic analysis, including knowledge extraction. The dictionary could help programmers create more effective and intelligent search engines or better manage information in document databases. The dictionary is also intended to help in the development of the so-called Semantic Internet. Maybe the language we use to describe words is not precise, but it is enough to help in the analysis of texts – says the researcher.

The plWordNet dictionary is modelled on the American Princeton Wordnet, which is the first and largest dictionary of this type (it contains about 150,000 entries). Initially, Wordnet, developed in the 1980s, was to be used only for investigations of children learning the meanings of words. Over time, it turned out that there are many more applications.

According to Prof. Piasecki, many countries, when developing their own word networks, decide only to translate the American wordnet. On the other hand, Polish researchers decided to develop the dictionary from scratch, based on very large corpora (sets) of texts. As a result it reflects the realities of the Polish language more effectively.

– I hope we'll reach 200,000 words. This is the size of the largest Polish language dictionary, hence our goal. But we want to search for the natural boundaries of language – adds Prof. Maciej Piasecki. The plWordNet already exceeds the linguistic competence of most native speakers of Polish. Work on it has already consumed a total of over 50 man-years and is carried out by a unique interdisciplinary team that has been in existence continuously since 2005.

The Polish wordnet is being built by a joint effort of lexicographers and IT specialists from the Language Technology Group at Wrocław University of Science and Technology. By the decision of the University authorities, the plWordNet is available free of charge for public (also commercial) use based on a license modelled on the Princeton WordNet license. Users can also browse the plWordNet using a mobile application and the WordNetLoom-Viewer (an application that allows you to view the network of meanings in the plWordNet). They can also download source files. Software developers also have access to the plWordNet at the level of the network service and the programming API.

Works on the plWordNet are carried out under grants from the Ministry of Science and Higher Education and EU Funds, and currently – as part of the pan-European scientific infrastructure CLARIN and the Polish consortium CLARIN-PL.

Gallery

Politechnika Wrocławska © 2024