SEMANTIC PROSODY AND PREFERENCE OF “HEALTHY” AND “UNHEALTHY” COLLOCATIONS IN COVID-19 CORPUS

This study is conducted in order to know the collocations of ‘healthy’ and ‘unhealthy’ as well as to explore the lexical meaning of those collocations. Corpus-based approach is employed in this study since the sole source of the data is the corpus data. Qualitative research method is used in order to find the hypotheses from the corpus data which is taken from Sketch Engine. The results demonstrate that the collocations of two node words are dissimilar in the categorization. ‘healthy’ node word indicates that three major semantic preferences are associated with it human, animal, disease. On the contrary, the semantic preferences of ‘unhealthy’ node word are diverse. Thus, the classification is based on the meaning of the collocations. The collocations with negative meaning occur more frequently than those with positive meaning. It is due to the fact that they use the prefixes –in and –un which create the opposite meaning of the original word. Therefore, the negative semantic prosody is more frequently found the two node words – ‘healthy’ and ‘unhealthy’.


Introduction
Language has been widely investigated by many scholars all over the world. Since then, the study of language has been broadly developed in order to make the investigation get easier. In learning language, it is not only about understanding the meaning of word by word, but also about finding its relation with other words trough collocation. In the 1950s, Firth, a British linguist, has employed the term collocation which is addressed to the meaning of words that are related to the interaction with other words (Hu, 2015). The analysis of collocation is based on its concordance and then becomes the root of semantic prosody analysis. Collocational analysis has been the concern of some experts, such as Salama (2011) who focuses on the study of ideological collocation and Jevric (2019) who relies on the different uses of prefixes on derivational analysis.
The examination of semantic prosody and semantic preference has also been a growing interest for many scholars. Sinclair (1987) firstly referred some words to be followed by positive or negative view (Cheng, 2013;Begagić, 2013). The terms semantic prosody and semantic preference were firstly proposed by Sinclair in 1991(Begagić, 2013. Semantic prosody can be defined as the collaborative meaning of node word and collocates which are obtained from a larger unit of text (Liu, 2020). Semantic preference, then, can be seen as a feature of collocates so that it can affect wider part of the text (Partington, 2004). Therefore, semantic preference is also beneficial in constructing semantic prosody (Begagić, 2018).
Study on semantic prosody and semantic preference are inseparable from corpus. It can be proven by the existing of some previous studies of semantic preference and semantic prosody by using corpus data (Nabu, 2020;Prihantoro, 2015;Oster & van Lawick, 2008). In addition, the examination of lexical meaning itself has also used corpus as the main data nowadays (Gulec & Gulec, 2015). The existence of 'corpus' indicates that the expansion of computer technology has enormously affected the study of language nowadays. The term corpus can be defined as the collection of texts which contain written or spoken material, such as transcriptions, created based on certain purposes that lead to how the text is tagged (Bloomer & Wray, 2006). Thus, huge number of words make corpus able to provide the researchers comprehensive evidence which help them design their projects in Linguistics.
As an approach in doing Linguistics research, corpus is divided into two -corpus-based and corpus-driven study. According to , corpus-based approach refers to the analysis which is under the certain frameworks which are limited in scope because of the restricted theoretical framework itself. McEnery & Hardie (2011) clearly defines corpusbased studies is a study that employs corpus data to explore theory or hypotheses to create the existing literature or ensure the literature. Corpus-driven approach uses corpus data as the only source of the hypotheses about such study of language.
This present study is conducted using corpus-based approach since it also combines the corpus as the source of data as well as to ensure the existing theory of semantic prosody and semantic preference which are closely related to the issue of corpus. The main objective of this research is to know the collocations of 'healthy' and 'unhealthy' as well as to explore the lexical meaning of those node words 'healthy' and 'unhealthy'. Therefore, semantic prosody and semantic preference are the major scrunity of this study.

Literature Review
Semantic prosody was originally introduced by (Sinclair, 1987). According to Sinclair, some words are associated with pleasant or unpleasant matters (Alrajhi, 2019). Semantic prosody is highly connected with connotations. It is usually taken due to the similarity in the viewpoint of expression (Partington, 2004). The notion of semantic prosody exists because it becomes the way the speakers share their purpose of speech which is seen in the entire semantics and pragmatics viewpoint (Liu, 2020). Semantic prosody also refers to the common discourse function of something followed by the repeated existence of the meaning of the that item (Sinclair, 1991). Unlike the semantic prosody, semantic preference can be viewed as familiar existence of a lexical item connected with some terms which can express a more specific meaning (Hunston, 2007).
The study of semantic prosody and semantic preferences has been a popular issue in corpus-based examination. According to Partington (2004), the notion of semantic prosody is even discussed in the post-Firthian corpus linguistics by Sinclair (1987), , ,  and Stubbs (2001). The sufficient data provided by corpus can be an essential need for the examination of semantic prosody and semantic preference.
The importance of semantic prosody and semantic preference has received increasing attention in the study of Corpus Lingustics during the past decade, such as the investigation of synonymous pairs (Hu, 2015); semantic prosody of a specific language (Prihantoro, 2015); semantic prosody of certain words in a corpus (Nabu, 2020); semantic prosody and semantic preference (Alrajhi, 2019;Liu, 2020). The corpus-based study has become very significant since it provides either the tool of creating the analysis of corpus data or the appropriate theory for examining the corpus. Two features are then very essential in the investigation of corpus -collocation and concordance. Due to the development of corpus study, more sufficient and more recent data were provided. Hence, it is crucial to carry out an investigation that is closely related to the current phenomena, such as what everyone all over the world has faced since 2019 -Covid-19.
The present study explored the appearance of healthy and unhealthy by considering their collocation. Unlike several investigations that focused on one side, semantic prosody only or semantic preference only, this study combined both sides in order to create a comprehensive analysis of those two node words in the corpus of Covid-19. Covid-19 corpus can become the representation of what this phenomenon is like nowadays. Therefore, this study is not only crucial in terms of reaching pedagogical goals, such as what previous studies mentioned (Zhang, 2010) and (Özbay, 2017), but also beneficial in general since Covid-19 has been a very debatable issue. In addition, this study also enlightens the authors or the author candidate of covid-19 research to present a clear description about covid-29 that will be very significant for people all over the world.

Research Method
The approach used in this study becomes essential to indicate the method used for this study. Since this study is conducted under the corpus-based approach, it obviously uses qualitative method to analyze the corpus data. Qualitative research refers to the study which is to form structures and patterns as well as how something is like (Litosseliti, 2010). Qualitative is an inductive approach that uses textual data to derive theory. This notion fits the focus of this study which is to use the corpus data in order to prove the existing theory.
The data employed in this research is taken from one of the well-known corpus tools used by many experts all over the world -Sketch Engine. This engine helps the researcher finds the data in order to create collocational analysis for this study. Sketch Engine allows many scholars to do a lot of kinds of analysis, such as keywords, n-grams, word frequency, concordance, and some others. This present study employed the tool 'concordance' in order to find the collocations of the node words.
The corpus used in Sketch Engine is 'Covid-19'. It is an existing corpus which consists of texts that were published as a part of . The data were retrieved from https://pages.semanticscholar.org/coronavirus-research (doi:10.5281/zenodo.3715505) accessed on 02-05-2020. 'Covid-19' is an English corpus which contains 224,061,570 words. The amount of the words has been sufficient for a corpus-based study.
The node words 'healthy' and 'unhealthy' are chosen because of several important criteria. Since the data is in accordance with health sciences, the terms which are most frequently used are terms related to medicine. In addition, the corpus 'Covid-19' consists of texts which are mostly discussed in journal articles. The articles are closely related to health contexts. The collocation part in the Sketch Engine is mainly used to determine the words which associate with the node words 'healthy' and 'unhealthy'. The collocations of the node words are employed to answer the research objective -to know the collocations of 'healthy' and 'unhealthy' as well as to explore the collocations of those node words.
The collocations are limited to the lexical words which are considered as the meaningful words. On the contrary, the grammatical words or the words which are not meaningful unless they are attached to the other lexical words are not under the consideration of the researcher. In addition, the collocations of the node words must be the words which are Semantic Prosody and Preference of "Healthy" and "Unhealthy" Collocations in Covid-19 Corpus, Nafilaturif'ah, Mohamad Irham Poluwa https://jurnal.uisu.ac.id/index.php/languageliteracy 359 Nationally Accredited SINTA 3, and indexed in DOAJ and Copernicus related in meaning with the node words since the analysis of this research is in accordance with the classification of the lexical meaning. Thus, the meaning of the words is essential.
The node words become the standard to find the collocations in Sketch Engine. The collocations are expanded to four words on the left and four words on the right. The span of four words is adopted to find the comprehensive data for the collocation analysis. The data taken is manually put in the list of the amount of MI score. The higher the number of MI score represents how the relation of the collocations with node words is like. The high MI score is associated with the possibility to establish the certain characteristic collocations compared with those with low MI score (McEnery, 2019). The collocations that are utilized to conduct analysis are based on the data showed in the first page based on the list of MI. However, there are only 20 collocations chosen in order to focus the analysis on each categorization. The collocations were classified based on some categories made by the researcher in order to find the clear pattern of the data.

The Collocations of 'Healthy'
The node word 'healthy' is followed by some collocations. There are some words which are associated with the node word 'healthy'. The first 20 row collocations based on the score of MI was taken ( The collocations of 'healthy' seem to have various parts of speech. However, the most dominated part of speech is noun, followed with adjective. Most of them have positive meaning. Thus, it means that they are usually used in positive context. There are few words with negative meaning or the words which are usually associated with negative context.

The Collocations of 'Unhealthy'
The node word 'unhealthy' is followed by some collocations. There are some words which are associated with the node word 'unhealthy'. The first 20 row collocations based on the score of MI was taken ( The collocations of 'unhealthy' seem to have various parts of speech. However, the most dominated part of speech is noun, followed with adjective. They are also diverse in meaning. Some of them relate to negative context which has negative meaning. The others do not refer to any of positive or negative meaning. Furthermore, the detail use of each collocation in the collocation analysis of 'healthy' is provided in Figure 2.

The Parts of Speech of the Collocations
The most frequently used collocation in both 'healthy' and 'unhealthy node words is the part of speech noun. When collocations are on the left or on the right side of the node words, noun can easily be found. This may happen because of some reasons. The first reason is because the part of speech of the node words is adjective. When the node words are on the right side, noun can be the most probably appeared part of speech on the left side of the node words. It is due to the fact that noun is described using adjective. For instance, the individuals are healthy. The word individuals is a noun and it can only be described using an adjective. Another collocation of noun also takes part on the right side of the node words. For example, the phrase 'healthy people' represents that the existence of adjective is to modify noun. Therefore, noun becomes gets so much attention to associate with the node words 'healthy' and 'unhealthy' since it can appear on both sides, left and right, as well as has close relation with adjective.
Another part of speech that is frequently related with adjective is adverb. Adverb most probably occurs before adjective because adverb modifies adjectives. For example, the phrase 'clinically unhealthy area' that can be narrowed down into 'clinically unhealthy' and Another collocation is actually the same as the node word -unhealthy. In www.collinsdictionary.com, it is mentioned that something that is unhealthy is likely to cause illness or poor health. The cause of illness or the poor condition of health represents that unhealthy is really negative in meaning. Moreover, another collocation begins with the prefix -un that also causes the opposite meaning. www.collinsdictionary.com provides some definitions of unsafe and all of them refer to negative meaning dangerous. This is the opposite of safe which may represent somebody who is in danger or being harmed.

Conclusion
This present study finds that there are some frequently used collocations in the node words 'healthy' and 'unhealthy'. The 20 collocations with high number of MI score was taken in order to know the use of each collocation in accordance with the node words. The results of the 'healthy' node word indicate that three major semantic preferences are associated with it. They are human, animal, disease. On the contrary, the categories of 'unhealthy' node word are diverse. Thus, the classification is based on the meaning of the collocations. It eventually finds that the collocations with negative meaning occur more frequently than those with positive meaning. Thus, the semantic prosody of 'unhealthy' is likely to be negative while it is likely to be positive in the node word 'healthy'. It is due to the fact that they use the prefixes -in and -un which create the opposite meaning of the original words.