Computer Technologies in Linguistics

Authors

DOI:

https://doi.org/10.34680/VERBA-2024-3(13)-8-23

Keywords:

computational linguistics, corpus linguistics, collocation, language frequency, distributional models, associativity measure, language models, sentiment analysis

Abstract

The article presents an overview of modern research in the field of computational and corpus linguistics. The relevance of the work is due to the fact that these areas are rapidly developing, so it is important to present an overview in Russian of the possibilities and achievements of computational linguistics. The work uses theoretical research methods. The article consists of two sections. The first examines the main studies in the field of corpus linguistics, the second briefly presents the achievements of computational linguistics. It is noted that corpus data have become an important source of data for linguistic works on various issues. Corpus information is used in studies of lexical semantics, grammar, discourse, history of language, author's individual style, etc., as well as for solving practical problems related to translation and language teaching. In general, work carried out using corpus data can be classified as functional and is often based on a distributive (thesaurus) approach to meaning. Computational linguistics is a broad field of research located at the intersection of linguistics, mathematics and information technology. The achievements of modern computational linguistics are used in practical tasks (automatic generation and perception of text, indexing and analysis of information). For the automation of speech, formal models of description are used, which assume consistent graphematic (phonological), morphological, syntactic, semantic and discourse analysis. Modern language models, which are most often trained on special corpora, are also used to solve linguistic problems. This work is addressed to linguists, specialists in the field of information technology, as well as students of philological and information sciences.

Downloads

Download data is not yet available.

Author Biography

V. A. Belov , Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation

Doctor of Philological Sciences, Associate Professor
e-mail: belov.vadim.a@gmail.com

 

References

Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research.  Target: International Journal of Translation Studies, 7(2), 223–243, 10.1075/target.7.2.03bak.

Baranov, A. N., Dobrovolsky, D. O. (eds.) (2021). Corpus Model of Dostoevsky's Idiostyle. E. A. Balashov, A. N. Baranov, D. O. Dobrovolsky, K. L. Kiseleva, A. D. Kozerenko, M. M. Korobova, M. N. Mikhailov, E. A. Osokina, N. A. Fateeva, L. L. Fedorova, E. V. Sharapova. Moscow: LEXRUS Publ., 2021. (In Russian).

Belov, V. A. (2020). Semantic Studies of the Organization and Functioning of the Mental Lexicon. Scientific Dialogue, 8, 29–51, 10.24224/2227-1295-2020-8-29-51. (In Russian).

Burgess, C., Lund, K. (2000). The dynamics of meaning in memory. Cognitive dynamics: Conceptual and representational change in humans and machines. Mahwah: Lawrence Erlbaum Associates Publishers, 117–156.

Bybee, J. (2002). Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change, 14, 261–290, 10.10170S0954394502143018.

Chebotyreva, K. A. (2024). Application of corpus technology in the process of teaching paremiological units to schoolchildren of specialized classes: Abstract of diss… Candidate of Pedagogical Sciences. Nizhny Novgorod. (In Russian).

Chilingaryan, K. P. (2021). Corpus linguistics: theory VS methodology. Bulletin of Peoples' Friendship University of Russia. Series: Language Theory. Semiotics. Semantics, 1, 196–218, 10.22363/2313-2299-2021-12-1-196-218. (In Russian).

Church, K., Hanks, P. (1996). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29, 10.3115/981623.981633.

Dobrovolsky, D. O., Levontina, I. B. (2009). Russian no, German nein, English no: a comparative study of semantics based on parallel corpora. Computational linguistics and intelligent technologies. Proceedings of the international conference "Dialogue 2009". Moscow, Russian State University for the Humanities Publ., 97–101. (In Russian).

Dobrovolsky, D.O. (2003). Corpus of Parallel Texts and Literary Translation. Nauchno-tekhnicheskaya informatsiya. Seriya 2: Informatsionnyye protsessy i sistemy, 10, 13–18. (In Russian).

Dobrushina, N. R. (2009). Corpus-based methods of teaching Russian. National Corpus of the Russian Language. 2006–2008. New results and prospects. St. Petersburg: Nestor-Istoriya Publ., 338–351. (In Russian).

Evgenyeva, A. P. (ed.) (1999). Dictionary of the Russian Language: In 4 volumes. Moscow: Russkiy Yazyk Publ., 1999. (In Russian).

Firth, J. R. (1957). Papers in Linguistics: 1934–1951. Oxford: Oxford University Press.

Gorelov, I. N., Sedov, K. F. (2001). Fundamentals of Psycholinguistics. Moscow: Labirint Publ. (In Russian).

Hilpert, M., Gries, S. (2009). Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition Get access Arrow. Literary and Linguistic Computing, 24 (4), 385–401, 10.1093/llc/fqn012.

Jurafsky, D., Martin, J. (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. Stanford. 

Kamshilova, O. N., Belyaeva, L. N. (2023). Machine translation in the era of digitalization: new practices, procedures and resources. Terra Linguistica, 1, 41–56, 10.18721/JHSS.14105. (In Russian).

Kibrik, A. A., Plungyan, V. A. (2002). Functionalism. Modern American Linguistics: Fundamental Directions. Ed. by A. A. Kibrik, I. M. Kobozeva, I. A. Sekerina. Moscow: Editorial URSS Publ., 276–339. (In Russian).

Landauer, Th., Foltz, P., Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25 (2-3), 259–284, 10.1080/01638539809545028.

Levelt, W. (1989). Speaking: From Intention to Articulation. Cambridge: MIT Press.

Litvinova, T. A., Panicheva, P. V. (2024). Individual differences in the associative meaning of a word through the lens of the language model and semantic differential. Theoretical and Applied Linguistics, 10(1), 61–93, 10.18413/2313-8912-2024-10-1-0-5. (In Russian).

Lukashevich, N. V., Levchik, A. V. (2016). Creation of a lexicon of evaluative words of the Russian language RuSentileks. Proceedings of the Open Semantic Technologies for Intelligent Systems (OSTIS-2016) conference. Minsk: Belarusian State University of Informatics And Radioelectronics Publ., 377–382. (In Russian).

Lyashevskaya, O. N. (2016). Corpus tools in grammatical studies of the Russian language. Moscow: Yazyki slavyanskoy kul'tury: Rukopisnyye pamyatnik Drevney Rusi Publ. (In Russian).

Mcenery, T., Hardie, A. (2011). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.

Miller, G., Beckwith, R., Fellbaum, C. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4), 235–244, 10.1093/ijl/3.4.235.

Plungyan, V. A. (2007). Corpus as a tool and as an ideology. National corpus of the Russian language and problems of humanitarian education. Proceedings of the international scientific conference. Moscow, April 19-20, 2007. Moscow: Higher School of Economics Publ. 64–66. (In Russian).

Radbil, T. B. (2024). Identifying the evaluative potential of a neutral word in poetry (based on online poetry corpora). Critique and Semiotics, 1, 138–157, 10.25205/2307-1753-2024-1-138-157. (In Russian).

Rogers, T. (2008). Computational models of semantic memory. The Cambridge Handbook of Computational Psychology. Cambridge, Cambridge University Press. 226–267, 10.1017/CBO9780511816772.012.

Romanov, A. S., Vasilyeva, M. I., Kurtukova, A. V., Meshcheryakov, R. V. (2018). Sentiment analysis of texts using machine learning methods. Proceedings of the 2nd International Conference “R. Piotrowski's Readings in Language Engineering and Applied Linguistics" (Saint Petersburg, 2017). Saint Petersburg: Creative Commons CCО, 86–95. (In Russian).

Rubtsova, Yu. (2012). Automatic construction and analysis of a corpus of short texts (microblog posts) for the task of developing and training a tone classifier. Knowledge Engineering and Semantic Web Technologies, 1, 109–116. (In Russian).

Rychkova, L. V., Kienya, S. N. (2010). Corpus technologies in teaching Russian as a foreign language. Ethnocultural and sociolinguistic aspects in the theory and practice of teaching languages in non-humanitarian universities: Collection of scientific articles. Minsk: Belarusian National Technical University, 32–43. (In Russian).

Ryukova, A.R. (2024). Corpus-oriented language studies: a brief summary of achievements and challenges. Russian Linguistic Bulletin, 1(49), 10.18454/RULB.2024.49.17. (In Russian).

Savchuk, S. O., Arkhangelsky, T. A., Bonch-Osmolovskaya, A. A., Donina, O. V., Kuznetsova, Yu. N., Lyashevskaya, O. N., Orekhov, B. V., Podryadchikova, M. V. (2024). Russian National Corpus 2.0: New opportunities and development prospects.Voprosy yazykoznaniya, 2, 7–34, 10.31857/0373-658X.2024.2.7-34. (In Russian).

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Smadja, F. McKeown, K., Hatzivassiloglou, V. (1996). Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1), 1–38.

Sofronova, E. V. (2024). Automated Sentiment Analysis of Feminitives in the Russian Language: Master's thesis: direction 45.04.04 "Intelligent systems in the humanitarian environment". St. Petersburg: Peter the Great St. Petersburg Polytechnic University, 10.18720/SPBPU/3/2024/vr/vr24-5826. (In Russian).

Teubert, W., Cermakova, A. (2007). Corpus Linguistics: A Short Introduction. London: Bloomsbury Academic.

Tognini-Bonelli, E. (2001). Corpus Linguistics at Work. Philadelphia: John Benjamins Publ., 223.

Ventsov, A. V., Kasevich, V. B. (2003). Problems of Speech Perception. Moscow: Editorial URSS. Publ. (In Russian).

Vinogradov, V. V. (1977). Phraseology. Semasiology. Lexicology and Lexicography. Selected Works. Moscow: Nauka Publ., 118–16. (In Russian).

Zakharov, V. P., Bogdanova, S. Yu. (2020). Corpus linguistics. St. Petersburg: St. Petersburg University Publ. (In Russian).

Zalesskaya, V. V. (2014). A program for identifying statistically significant meaningful binomial collocations in the text (based on the Russian language). XVII All-Russian United Conference "Internet and Modern Society" (IMS-2014). St. Petersburg. Electronic resource. Retrieved from: https://ojs.itmo.ru/index.php/IMS/article/download/267/263. (In Russian).

Zaliznyak, Anna A., Levontina, I. B., Shmelev, A. D. (2005). Key ideas of the Russian linguistic picture of the world. Moscow: Yazyki slavyanskoy kul'tury. (In Russian).

Zanettin, F. (2014). Translation-driven corpora: Corpus resources for descriptive and applied translation studies. London; New-York: Routledge Publ. https://doi.org/10.4324/9781315759661.

Published

2024-10-30

How to Cite

Belov В. А. . (2024). Computer Technologies in Linguistics. Verba, (3 (13), 8–23. https://doi.org/10.34680/VERBA-2024-3(13)-8-23

Issue

Section

Theoretical Comprehension of Innovations, Challenges and Prospects