How TSU united linguists and programmers

12 August 2019

TSU computer linguists have created a program that generates news headlines in Russian, and in the future will write news itself. The text of the finished news is loaded into a special window and after a few seconds, the program issues its version of the title. It took a year for the TSU master’s students of the Computer and Cognitive Linguistics program to create a web service for extracting facts from texts and automatically generating news.

The developers (the project is being implemented jointly with Elecard-Med) are sure: a service based on machine learning technology will be in demand in today’s news agencies.

- At the first stage, we solved the problem of abstracting the text (that is, extracting facts and their connections from it). We needed to teach the program how to generate a headline, which we successfully coped with. The next stage was to write a lead (first paragraph of the news), and then the news in full, - says Zoya Rezanova, head of the TSU Laboratory of Cognitive Studies of Language, head of the master’s program Computer and Cognitive Linguistics.

This market is becoming more competitive, but so far there are only English prototypes. You can’t just translate them into Russian - the grammar differs radically: we have noun and adjective declension and verb conjugation, and the word order is much more arbitrary.

Therefore, to train a machine, it was not enough to write a program and load a data set into it (tens of thousands of pieces of news), we needed specialists who understand the linguistic intricacies, that is, how the language works in general and how speech is generated by an individual.

- A programmer who comes to our master’s program will not become a linguist, just like a philologist will not become a Big Data specialist. But we are teaching a specialist who can work at the intersection of disciplines. Interdisciplinarity is one of the main trends in education, - emphasizes Zoya Rezanova.  At the beginning of the 21st century, linguistics changed its appearance: new technologies brought research to a new level. But interdisciplinary areas were born to solve social problems. Therefore, two years ago when we created a new master's program based on the StrAU Institute of Humans of the Digital Age, we conceptually combined two areas - cognitive linguistics and computerlinguistics, - says Zoya Rezanova.

Cognitive linguistics explores how language interacts with mental mechanisms. Language and brain: cognitive modeling in PR activities, statistical methods in humanitarian research, linguistic information processing, and cognitive psychology are among the main disciplines that master’s students study.

Zoya Rezanova explains: “Language is not what is written and not what is spoken. It is born in our consciousness, consciousness exists in our body, and our body and personality in the natural and socio-cultural aspect. And each factor affects the generation of speech. We conduct a lot of behavioral experiments to better learn the essence and structure of the Russian language. For example, the Laboratory of Linguistic Anthropology is investigating the reading process (using an eye-tracking sensor) to understand how we process texts.”

Computer linguistics is another area that helps us to comprehend changes in modern communications. And maybe make the human-machine dialogue more qualitative. The program studies both the theory of the language and programming languages (Python and R).

- Big data analysis is an absolute trend, but a huge amount of information is language information, therefore a way to penetrate this information is text analysis. It is necessary to create effective mechanisms for extracting information from text without human intervention, automatic analyzers, says Zoya Rezanova.

The next basic task is not just to replace a person in extracting information, but also to generate this information and create secondary texts. The meaningful response of bots, for example, from a bank is also computer linguistics technology. And the day is not far away when bots will learn to recognize intonation and emotions and speak like people.