Сравнительно-сопоставительный анализ лингвистических ресурсов для проведения корпусного анализа текстов

A. V. Dmitrijev; E. S. Krupnova

doi:10.34680/VERBA-2024-3(13)-24-35

Authors

A. V. Dmitrijev Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation https://orcid.org/0000-0003-3632-793X
E. S. Krupnova Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation https://orcid.org/0009-0007-3127-2737

DOI:

https://doi.org/10.34680/VERBA-2024-3(13)-24-35

Keywords:

natural language processing, corpus linguistics, linguistic corpora, corpus manager, corpus stylistics, stylistic corpus analysis

Abstract

In the last few decades, a scientific field known as computational linguistics has been actively developing. The paper discusses the main task of corpus linguistics – corpus analysis of written natural-language texts with the help of linguistic resources that are used to solve it. Corpus analysis refers to a method of language research that utilizes large collections of texts or corpora to obtain statistical and linguistic data about the language. Linguistic resources such as dictionaries, thesauri, and grammatical databases greatly enhance the capability and accuracy of corpus analysis. In addition, corpus linguistics deals with the building of corpus managers that process texts, perform concordance, search for keywords and collocations, etc. The paper briefly describes the functionality of WMatrix, WordSmith, GATE, AntConc and Sketch Engine programs and makes a comparative-contrastive analysis of their characteristics. It is concluded that the programs differ in feature set, data saving parameters, input text format and accessibility. In addition, directions for their use in research and practice are suggested. Linguistic resources can be useful for stylistic analysis of texts, studying linguistic features of author's style, teaching a foreign language, for example, grammar or vocabulary, in computer lexicography, discourse analysis and other directions. The example of the corpus analysis of the topic famine during the blockade of Leningrad with the help of the AntConc program is given. In the course of the mentioned research, 749 fragments of memories of Leningrad citizens were collected on the basis of 15 frequency words and a frequency dictionary of 158 words was compiled. Considered tools not only increase the accuracy of analysis, but also expand the possibilities and integrate into software tools for automation of corpus analysis. The choice of the appropriate tool for the study depends on the scope and depth of text analysis.

Downloads

Download data is not yet available.

Author Biographies

A. V. Dmitrijev, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation

Candidate of Philological Sciences, Associate Professor
e-mail: avd84@list.ru

E. S. Krupnova, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation

Master's degree, specialist in educational and methodological work
e-mail: krupnalena@mail.ru

References

AntConc program: official website. Retrieved from https://www.laurenceanthony.net/software/antconc/

Bolshakova, E. I., Klyshinsky, E. S. (2011). Automatic processing of texts in natural language and computational linguistics: textbook. Moscow: Moscow Institute of Electronics and Mathematics Publ., 272. (In Russian).

Fischer-Starcke, B. (2010). Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London; New York: Continuum.

Kotyurova, I. A. (2020). Corpus-based studies with Antconc service at the university. Language and Culture, 52, 36–50, 10.17223/19996195/52/3. (In Russian).

Krotova, E. B. (2019). Sketch Engine for linguistic research. Germanistics Today: Proceedings of the International Practical Conference. Kazan: Kazan Publ. Kazan University, 107–112. (In Russian).

Krupnova E.S. (2024). Corpus analysis of the theme "hunger" during the blockade of Leningrad and the compilation of a frequency dictionary. Second International Youth Competition of Scientific Projects “Erasing Borders”: collection of materials. Moscow: Kosygin Russian State University Publ., 143–146. (In Russian).

Leech, G., Short, M. (2007). Style in Fiction: A Linguistic Introduction to English Fictional. London; New York: Longman. Retrieved from: https://sv-etc.nl/styleinfiction.pdf.

Mahlberg M. (2012). Corpus Stylistics and Dickens’s Fiction. New York: Routledge.

Mahlberg, M. (2012). The corpus stylistic analysis of fiction – or the fiction of corpus stylistics? Corpus Linguistics and Variation in English, 75, 77–95, 10.1163/9789401207713_008.

McIntyre D. (2015). Towards an integrated corpus stylistics. Topics in Linguistics, 16(1), 59–69, 10.2478/topling-2015-0011. Retrieved from: http://dx.doi.org/10.2478/topling-2015-0011.

Nikolaev, I. S., Mitrenina, O. V., Lando, T. M. (2016). Applied and computational linguistics. Collective monograph. 2nd ed. Moscow: LELAND Publ. (In Russian).

Paliychuk, D. A. (2022). Corpus technologies in the study of collocations (by the example of “AntConc” and “SketchEngine” services). Studia Humanitatis, 2, 13–14. Retrieved from: https://cyberleninka.ru/article/n/korpusnye-tehnologii-v-izuchenii-kollokatsiy-na-primere-servisov-antconc-i-sketchengine. (In Russian).

Rayson, P. (2009). Wmatrix: a Web-based Corpus Processing Environment. Retrieved from: http://ucrel.lancs.ac.uk/wmatrix/.

Rubaylo, A. V., Kosenko, M. Yu. (2016). Program means of information extraction from natural language texts. Almanac of Modern Science and Education, 12(114), 87–92. (In Russian).

Shamova, N. A. (2021). Comparative-comparative analysis of corpus tools (on the example of work with film discourse corpora). Bulletin of N.A. Dobrolyubov Nizhny Novgorod State Linguistic University, 53, 82–95, 10.47388/2072-3490/lunn2021-53-1-82-95. (In Russian).

Sketch Engine program: official website. Retrieved from: https://www.sketchengine.eu/.

WMatrix 5. Documentation: Step-by-step instructions using a case study of linguistic analysis of political party manifestos for the UK General Election (updated November 2022). Retrieved from: https://ucrel.lancs.ac.uk/wmatrix/tutorial/.

WMatrix 6. Documentation: Step-by-step instructions on the example of linguistic analysis of political party manifestos for the UK General Election (updated in June 2023). Retrieved from: https://ucrel.lancs.ac.uk/wmatrix/tutorial6/.

WordSmith Tools. Retrieved from: https://lexically.net/downloads/version_64_8/HTML/index.html.

Zakharov, V. P. (2005). Corpus linguistics: Manual. Saint Petersburg: Saint Petersburg State University Publ. (In Russian).

Comparative-Contrastive Analysis of Linguistic Resources for Corpus Analysis of Texts

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

A. V. Dmitrijev, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation

E. S. Krupnova, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russian Federation

Downloads

Published

How to Cite

Issue

Section

License

Language

Make a Submission

ISSN

Links