Polish AI is Operational. It Was Developed by Scientists from Our Faculty

Date: 07.03.2025 Category: General

It can summarise documents, advise users of the mObywatel application, and search for necessary information – meet PLLuM, a large language model developed by a consortium led by Wrocław University of Science and Technology, headed by scientists from the ICT Faculty. The solutions developed by Polish researchers are accessible to everyone.

PLLuM (Polish Large Language Universal Model) is a project of a large language model based on artificial intelligence technology. It allows its users to process and generate texts in Polish language, and its operation will support the development of digital skills and innovations in public administration and business.

For specific Polish needs

Prof. Tomasz Kajdanowicz, head of the Department of Artificial Intelligence, has no doubt that developing a Polish model was necessary and that the model worth to be further developed. “We can use other applications today, but the question is: do we want Poland's economy, administration, and science to be entirely dependent on foreign solutions?” asks the scientist.

He also observes that the currently available foreign models are not neutral, and their development is driven by the interests of companies outside Poland, which do not disclose information about their products. As a result, they may be filtering the content they provide, and not fully allow for Polish cultural and historical perspective.

– There are already studies that show this problem. The long-term development of AI in Poland, and PLLuM is not a one-off project, will lead to many investments in future technological solutions, as well as in the training of Polish experts and scientists in this field – emphasises Prof. Kajdanowicz.

Almost a year of hard work

Works on PLLuM began in 2023, initiated by scientists from our Faculty who had previously been involved in language technology research as part of the CLARIN-PL project. Its results inlcuded e.g. research infrastructure used mainly in the areas of humanities and social sciences. Over the last five years, researchers have been working on a wide variety of databases and natural language processing tools.

Ultimately, PLLuM was developed not only by scientists from our Faculty, but also by specialists from the Institute of Computer Science of the Polish Academy of Sciences, Institute of Slavic Studies of the Polish Academy of Sciences, the NASK Scientific and Academic Computer Network, Information Processing Centre, and the University of Łódź. The project was granted over 14 million PLN by the Ministry of Digital Affairs.

– In eleven months, we have built an entire family of language models that are used for natural language processing and have a wide range of applications, including in information extraction and limited content understanding – explains Prof. Maciej Piasecki from the Department of Artificial Intelligence, project coordinator.

– We have also created a special version of the model that can answer questions based on any document database. It has already been tested by the Ministry of Digital Affairs, providing information about matters related to public institutions, particularly about what is available within the mObywatel app. And it can boast a very high level of effectiveness – he adds.

The chatbot model has already been made available in a browser-based form at http://pllum.clarin-pl.eu/, where anyone can test it.

The Polish language model is available in many versions, enabling flexible and scalable adaptation to user needs – it employs from 8 to 70 billion parameters. It also allows precise content generation in Polish. Smaller versions work well for quick tasks, while larger models offer higher precision and contextual consistency in understanding the Polish language.

– It should be emphasised that we did not develop PLLuM as a competitor to existing language models, but as our own proprietary Polish engine over which we have full control and know exactly what it contains. Additionally, we also have full control over the process of its construction, customisation, and implementation. Our public institutions and companies can be thus confident in the security of their data. Working on this solution, we also gained a wealth of knowledge that will undoubtedly pay off in the future – notes Prof. Piasecki.

Exceptional clarity

Commercial versions utilise text resources from owners who have licensed them to the consortium, as well as resources that can be used to build a fully open model. Scientific models also utilise publicly available datasets, such as Common Crawl.

– Our model is very transparent in legal terms, which makes it unique. I don't know of another solution like this. We didn't take any shortcuts; we conducted numerous legal analyses, collaborated with the Ministry of Justice, and we are confident that we are using all data in compliance with current law – emphasises Prof. Piasecki.

Compared to other language models, PLLuM also stands out because it is tailored to the specifics of the Polish language and public administration terminology. It uses manually developed data rather than other language models. In total, the scientists have prepared over 50,000 manually formulated instructions for training models and approximately 130,000 so-called preferences, which are examples of conversations prepared by people of various ages.

– As a result, the model trained on Polish resources handles inflection and complex syntax very well, all to ensure that its responses are as precise and natural as possible – adds the researcher.

Time to implement stage number two

The continuation of the PLLuM project is named HIVE AI and will be coordinated by the NASK Scientific and Academic Computer Network – National Research Institute. Apart from Wrocław University of Science and Technology, the project will be carried out by ACK CYFRONET AGH, Central IT Centre, Institute of Computer Science of the Polish Academy of Sciences, Institute of Slavic Studies of the Polish Academy of Sciences, Information Processing Centre, and the University of Łódź.

The Ministry of Digital Affairs has granted the consortium a subsidy of 19 million PLN for further development of the project.

In the future, PLLuM will support users among others in obtaining information in the mObywatel application by automating document processing, content analysis, information retrieval, and by facilitating the development of educational applications and translations, which may assist teachers in conducting lessons using the latest technologies.

The HIVE AI programme will consist of four stages: the construction of language data corpora for foundational training, the training of large language models, evaluation, and the pilot implementation of models in the public sector. The works will continue until the end of 2025.

Gallery

Print Share this page

Back

Faculty of Information and Communication Technology