Sign in

 

Faculty of Information and Communication Technology

Exposing Fake News, or the SWAROG Project

Date: 27.09.2021 Category: Science

The development of a system detecting sources of deliberate disinformation is the main goal of the SWAROG project, carried out by scientists from the Faculty of Electronics at Wrocław University of Science and Technology. Their research was approved an over PLN 8.6 million grant from the National Centre for Research and Development.

fake_news1.jpgThe funding was granted under the Strategic Program for Scientific Research and Development "Advanced information, Telecommunications and Mechatronic Technologies" – Infostrateg I. In total, nearly PLN 55 million of grants were awarded to ten projects, and the application prepared by our scientists – as the only one – was given the maximum number of points.
The consortium that will implement the project, apart from Wrocław University of Science and Technology, also includes MATIC Inc. and Bydgoszcz University of Science and Technology.

Tracking disinformation

The “Artificial Intelligence Disinformation Detection System" (SWAROG) project is focused on the detection of intentional disinformation sources using tools that can be developed with machine learning methods.
The project was divided into three phases, and in the first of them our scientists will check the possibility of implementing the idea and achieving the assumed results.

“From the available literature on the subject, which we have also contributed to a bit in recent years, we know that the implementation of an automatic fake news detection system is possible, but its basic limitation is the difficulty in obtaining appropriate data to build a reliable predictive system,” explains Paweł Ksieniewicz, Ph.D. from the Department of Network and Computer Systems at the Faculty of Electronics, who will lead the research and development works.

Importantly, the problem is not to create a large enough data set that could be used by researchers to build recognition models, but to reliably label it so that the artificial intelligence system is given the ability to effectively distinguish between fact and attempted disinformation.

Therefore, the greatest challenge of the first phase will be to develop mechanisms of objective content labelling and to use them in the acquisition of a large and reliable data set in Polish, which will cover a long range of publication time and thus be the first available fake news corpus of this type.

“Simultaneously, using the collections available to the scientific community in English and – to a large extent – our own original recognition methods, we will attempt the construction of a universal document-processing architecture for the purposes of fake news classification,” adds the scientist.

The first phase will be completed by implementing the system prototype, which will work for documents in English and as a web service.

Studies of English and Polish content

fake_news.jpgDuring the next phase, the document-processing architecture will be expanded, which will allow the development of the prototype of the system with the ability to recognize content in Polish as well. An additional task will be to develop a comprehensive method for predicting the spread of malicious content, which will allow an additional social context to be built for each of the analysed documents, thus introducing additional, independent information about the identified objects.

“In the last phase of the project, we will try to meet the challenges that each system using artificial intelligence faces after it reaches the commercial market,” explains Paweł Ksieniewicz, Ph.D. “We must bear in mind that the nature of knowledge is its historicity, which manifests itself in continuous, typically fluid changes in the definitions of the concepts it describes. As a result, any recognition system becomes in a sense obsolete already when it is made available to end users,” he admits.

This means that the system is being used, it gradually degenerates in terms of the quality of its decisions, and in an extreme situation it drops to the level of a random classifier, which, instead of a once almost perfect decision, has nothing more to offer than a blind shot.

“In the field of pattern recognition, such changes are called concept drifts. The methods of counteracting their negative impact on the quality of recognition models are dealt with in the sub-field of data stream processing, which in recent years has been one of the basic research topics undertaken by the Machine Learning Team, Department of Network and Computer Systems. This is where we will use our experience on the one hand to develop a methodology for its maintenance and ongoing evaluation in the final version of the system, and on the other hand to provide it with the option of adapting it to the phenomenon of concept drift,” emphasizes the scientist.

Implementation in two models

The responsibility for the implementation of the project, which is scheduled for April 2025, will rest on MATIC, and the solution will be offered in two models.

The first of them will be the cloud-based model, being the most adapted to the modern market. The developed solution will serve to provide a data stream service, in which fake news will be detected, and which will be configured by the recipient from default (built-in) streams or from own sources – using dedicated mechanisms of connecting to data sources tailored for a given customer.
The second commercialization model will consist in on-site implementations – in a dedicated customer infrastructure. Such a model will be offered to those recipients whose requirements regarding the amount of data, the scope of integration with internal systems or the confidentiality of data processing exclude cloud solutions, i.e. primarily to government agencies and to large institutional clients such as, for example, the Polish Press Agency. A market analysis allowed us to estimate that approximately 80% of customers will choose the cloud-based implementation.

An innovative system, but for whom?

The system will be sold to publishers – including in social media and in journalistic editorial offices. The group of potential clients in Poland includes, among others, TVP, Polish Radio, Polish Press Agency, Onet.pl, Polska Press, RMF FM, Agora, Polsat and TVN.

The commercialization plan also provides for a pilot implementation in association with the Polish Platform for Homeland Security (ppbw.pl). For selected topics (such as vaccinations, which is currently a very important element of the public debate) the institution will carry out a public mission, at the same time supporting the commercialization of the full product.

The implementation of the project will start on 1 October, 2021. The project duration is 42 months and this period includes 36 months dedicated to its implementation by the contractor and a total of 6 months for the NCBR to assess the reports summarizing the first two phases of the project. Each phase of the project will last 12 months, with three-month evaluation phases between each research phase.

Gallery

Politechnika Wrocławska © 2024