Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Dec 22, 2020
Open Peer Review Period: Jan 6, 2021 - Mar 6, 2021
Date Accepted: May 30, 2021
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Emerging Trends and Thematic Evolution of Breast Cancer: Knowledge Mapping and Co-Word Analysis

SadatMosavi A, Tajedini O, Esmaeili O, Abolhassani Zadeh F, Khazaneha M

Emerging Trends and Thematic Evolution of Breast Cancer: Knowledge Mapping and Co-Word Analysis

JMIR Cancer 2021;7(4):e26691

DOI: 10.2196/26691

PMID: 34709188

PMCID: 8587182

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Emerging Trends and Thematic Evolution of Breast Cancer: A Study Based on Knowledge Map and Co-Word Analysis One of the requirements for scientists and researchers to enter any field of science is to have a correct understanding of that science. Aims to draw a science map, provide structural analysis, explore the evolution, and find new trends in articles published in the field of breast cancer. this research was a descriptive survey with a scientometric approach. The data for the present study were collected from the Medline and search strategy based on Medical Subject Heading (MeSH) term. In this study was used science mapping, which provides a visual representation and a longitudinal evolution of the interrelations between scientific a

  • Ali SadatMosavi; 
  • Oranus Tajedini; 
  • Omid Esmaeili; 
  • Firoozeh Abolhassani Zadeh; 
  • Mahdiyeh Khazaneha

ABSTRACT

Background:

One of the requirements for scientists and researchers to enter any field of science is to have a correct understanding of that science(1). Accordingly, expressing the concepts, history, framework, scope, components, and functions of each science, as well as analyzing and examining the position of its links in the intertwined chain of human sciences and especially expressing these links with the fields on which they are more dependent, has a prominent place(2). In general, this knowledge should facilitate the best assay to gain a correct picture of the fields of activity and applications of that science, and this picture should be a way for those who have not yet determined their future research passage way(3). Science mapping is the analysis of the publications of a scientific field from different viewpoints and visualize a general assessment of that field(4). By using of this map drawing of the course of these changes and developments, we will be able to differentiate the fields with the most and the least proximity. Science mapping is drawn to identify points of knowledge that follow “hot topics” and current trends in a given filed (5). A science map drawn based on the scientific-research outputs of scientists in a scientific field makes it possible to study the emergence of new fields and the cessation of some saturated scientific fields(6, 7). Simply put, a science map is to depict the results of the analysis of publications of a scientific field from different angles and to provide an overview of that field (8). Science maps attempt to show the processes of growth, integration, and disintegration of different fields of science during the time. The scientific domains in these maps are determined in proportion to the level of activity of scientists, and the empty spaces indicate the unworked or unknown domains of science. This illustration shows the growth, integration, or disintegration of different scientific fields over time (9) One of the most widely used methods for analyzing the structure of knowledge in various fields and drawing science maps is a co-word analysis where the co-occurrence of keywords in the title, abstract, or text of articles is examined. Therefore, co-analysis is done on a set of journal articles in a specific subject area (10). By analyzing the keywords used in articles in a given research field, we can come up with a picture of the real content of the topics in that field (11). By measuring the relative intensity of these co-occurrences, simplified representations of concept networks in a given field can be illustrated (12). Co-word analysis can reveal the main topics of the field under study, semantic structures, and the evolution of those works over time. In the co-word analysis, it is assumed that the most frequent words have a greater impact on a filed than the less frequent words. Besides, co-word analysis allows us to reveal emerging trends and changes in paradigms to facilitate predicting the direction of future research(13). Co-word analysis can be used as a powerful tool to enable the follow-up of structural changes and the development of the socio-cognitive network (17). Besides, this method helps us identify emerging topics in scientific fields and draw a clear path for future research(14). One of the most important topics in medical research is breast cancer. Breast cancer is the most frequent cancer among women, impacting 2.1 million women each year, and also causes the greatest number of cancer-related deaths among women (15). In 2019, an estimated 268,600 new cases of invasive breast cancer will be diagnosed among women and approximately 2,670 cases will be diagnosed in men(16). In addition, an estimated 48,100 cases of DCIS will be diagnosed among women. Approximately 41,760 women and 500 men are expected to die from breast cancer in 2019(17, 18). As in other fields of science, new researches are presented every day in the field of breast cancer to advance this field, many of which sometimes have some similarities and overlaps. For a variety of reasons, the volume of research suddenly increases sometimes in some subfields, and in this increment, thematic overlaps could be occured. However, in other areas, little research may be done over months and years. Given the importance of research in the field of medicine in general, and breast cancer in particular, it is necessary to provide a broad picture of the status of research conducted in this field. In other words, the structure of knowledge in this field should be revealed using co-word analysis techniques to show how this field has developed over time, and more importantly, to show the emerging topics, issues, and themes that have developed in this field. The present study uses co-word analysis to examine articles published in the field of breast cancer and improve or continue the necessary context for correction, continuation, or promotion of the pattern of their scientific behavior by gaining an understanding of the co-word status of this field and the interests and tendencies of researchers in the field over time.

Objective:

Accordingly, this study aims to draw a science map, provide structural analysis, explore the evolution, and find new trends in articles published in the field of breast cancer by addressing the following questions: 1. What are the most important research areas in the field of breast cancer? 2. Beneath which of the four themes (motor themes, specialized and peripheral themes, emerging or disappearing themes, and general and basic themes) do breast cancer thematic areas fall in the strategic diagram? 3. What are the most important issues in terms of frequency and intensity? 4. How have breast cancer thematic areas been developed in different periods?

Methods:

Data Extraction MEDLINE Database was used in this study to retrieve and extract bibliographic information from breast cancer-related documents. MEDLINE is the U.S. National Library of Medicine® (NLM) premier bibliographic database that contains more than 25 million references to journal articles in life sciences with a concentration on biomedicine(19). A distinctive feature of MEDLINE is that the records are indexed with NLM Medical Subject Headings (MeSH®)(20, 21). The subject scope of MEDLINE is biomedicine and health, broadly defined to encompass those areas of the life sciences, behavioral sciences, chemical sciences, and bioengineering needed by health professionals and others engaged in basic research and clinical care, public health, health policy development, or related educational activities. MEDLINE also covers life sciences vital to biomedical practitioners, researchers, and educators, including aspects of biology, environmental science, marine biology, plant and animal science as well as biophysics and chemistry (21). To further validate the retrieved results, the search strategy used in this study was limited to research papers published in core clinical journals. The period covered in this study included all the years covered by this database (1950: 2020-03-24). In other words, this study covered 12577 articles published in a period of 70 years. The retrieved records were saved as full records in Plain text, tab-delimited, and RIS formats. Finally, after saving the retried data, the related files were integrated and saved as a single file for later use. Data Analysis This study has been written in the basis of co-occurrences. Bibliometric methods explore the impact of a research field, a group of researchers, or a particular paper(22). In this study we used science mapping, which provides a visual representation and a longitudinal evolution of the interrelations between scientific areas, documents or authors, reflecting its cognitive architecture (23). We used SciMAT software(24) (http://sci2s.ugr.es/scimat), which is a powerful open source science mapping software tool. It allows us to analyze the evolution and relevance of the Literature focused on breast cancer. This tool was designed according to the science mapping analysis approach, which allow us to analyze a research field, to detect and visualize its conceptual subdomains (particular topics/themes or general thematic areas) as well as to perform a longitudinal framework in order to analyze and track the conceptual, intellectual or social evolution of e-Government through the course of consecutive time periods (25). Different bibliometric tools are available to perform this kind of study (25), but SciMAT contains some characteristics that distinguish it from other SMA software tools. SciMAT divides the analysis into four phases. A detailed explanation of the four phases can be found elsewhere (7, 26), although a brief description is shown below: Detection of the research themes. This phase summarizes the first five steps of the workflow of science mapping analysis. In each period studied, the corresponding research themes are detected by applying a co-word analysis (27)to raw data of all the published documents in the research field, followed by a clustering of keywords to topics/themes using the simple centers algorithm (28). Formally, the methodological foundation of co-word analysis is based on the idea that the co-occurrence of keywords describes the content of the documents in a corpus (29). These co-occurrences of keywords can be used to build co-word networks (30) and these networks can be associated with research themes using clustering tools. The co-occurrence frequency of two keywords is extracted from the corpus by counting the number of documents in which the two keywords appear together. Once the co-word network is built, each arc/edge will have in its weight the co-occurrence value of the linked terms. Next, the weight of each edge is transformed in order to normalize it (extract the similarity relations between terms) using their keyword and co-occurrence frequencies (31). Figure 1. Strategic diagram (23) Visualizing research themes and thematic network(32). In this phase the detected themes are visualized by means of two different visualization instruments: strategic diagram (24, 33, 34) and thematic network. Each theme can be characterized by two measures (12): centrality and density. Centrality measures the degree of interaction of a network with other networks and measures the strength of external ties to other themes(35). This value can be taken as the measure of the importance of a theme in the development of the entire research field analyzed. The density measures the internal strength of the network and measures the strength of internal ties among all the keywords that describe the research theme. This value can be understood as a measure of the theme’s development(36). Once the centrality and density rankings have been calculated, the themes can be laid out in a strategic diagram. Given both measurements, a research field can be visualized as a set of research themes, mapped in a two-dimensional strategic diagram (Fig. 1) and classified into four groups: (a) Themes are both well developed and important for the structuring of a research field. They are known as the motor themes of the specialty, given that they present strong centrality and high density. (b) Themes have well-developed internal ties but unimportant external ties and as they are of only marginal importance for the field. These themes are very specialized and peripheral. (c) Themes are both weakly developed and marginal. The themes in this quadrant have low density and low centrality and mainly represent either emerging or disappearing themes. (d) Themes are important for a research field but are not developed. This quadrant contains transversal and general, basic themes (notes in computer science). Discovery of thematic areas – Temporal or longitudinal analysis. In this phase, the evolution of the research themes over a set of periods of time is first detected and then analyzed in order to identify the main general areas of evolution in the research filed, their origins, and their interrelationships. This allows us to discover the conceptual, social or intellectual evolution of the field. SciMAT is able to build an evolution map (25) and an overlapping items graph (Fig. 2) (37) to detect the evolution areas (see Fig. 3). For this purpose, the inclusion index is used to detect conceptual nexuses between research themes in different periods and, in this way, to identify the thematic areas in a research field. In addition, as each theme is associated with a set of documents, each thematic area could also have an associated collection of documents, obtained by combining the documents associated with its set of themes. In this sense, the evolution map shows temporal evolution of research themes of e- Government and the overlapping graph represents the number of associated keywords (Fig. 2)(38). Performance analysis. In this phase, the relative contribution of research themes and thematic areas to the whole research field is measured (quantitatively and qualitatively) and used to establish the most prominent, most productive and highest-impact subfields. This performance analysis is developed as a complement to the analysis step of the science mapping workflow. Some bibliometric indicators used: the number of published documents, the number of citations, and, or the different types of h-index (39, 40). Eventually, three diagrams were represented based on three periods Visualization phase. Following the science mapping workflow, visualization techniques were used to represent a science map and the results of the different analyses. In this sense, the network results from the mapping step were represented by a strategic map, evolution map and overlapping graph. Finally, when the science mapping analysis was finished, it was time for experts to analyze the results and maps, using their experience and knowledge. Figure 2. The overlapping graph: The horizontal arrow represents the number of items share by both periods. The upper incoming arrow represents the number of new items in period 2, and the upper outgoing arrow represents the items that are presented in period 1, but not in period 2. (24, 38) Figure 3. The evolution map: Cluster D1 is discontinued, and Cluster D2 is considered to be a new cluster. The solid lines mean that the linked cluster shares the main item (usually the most significant one). A dotted line means that the themes share elements that are not the main item. The thickness of the edges is proportional to the Inclusion Index, and the volume of the spheres is proportional to the number of published documents associated with each fluster. (24, 38)

Results:

After retrieving the number of 12577records related to scientometric researches, was demonstrated important of key word (table1) and Journals (table2). Table 1: The terms most frequently used in the articles Raw Terms Number % of 12577 Raw Terms Number % of 12577 1 Breast neoplasms 11855 97.9 16 Axilla 779 6.4 2 Prognosis 1919 15.8 17 Carcinoma intraductal noninfiltrating 753 6.2 3 Lymphatic metastasis 1735 14.3 18 Antineoplastic combined chemotherapy protocols 720 5.9 4 Mastectomy 1520 12.6 19 Lymph nodes 686 5.7 5 Mammography 1435 11.8 20 Mass screening 677 5.6 6 Neoplasm staging 1420 11.7 21 Antineoplastic agents 674 5.6 7 Neoplasm recurrence local 1208 10.0 22 Biomarkers tumor 655 5.4 8 Neoplasm metastasis 1154 9.5 23 Immunohistochemistry 640 5.3 9 Risk factors 1115 9.2 24 Chemotherapy adjuvant 594 4.9 10 Receptors estrogen 1113 9.2 25 Neoplasm invasiveness 591 4.9 11 Time factors 1014 8.4 26 Lymph node excision 590 4.9 12 Age factors 955 7.9 27 Tamoxifen 581 4.8 13 Carcinoma 889 7.3 28 Receptors progesterone 563 4.6 14 Carcinoma ductal breast 814 6.7 29 Mastectomy segmental 543 4.5 15 Combined modality therapy 781 6.4 30 Menopause 526 4.3 Table 2: The 30 journals with the highest number of articles published on breast cancer Raw Source Titles Records % of 12577 Raw Source Titles Records % of 12577 1 Cancer 3321 27.4 16 The British journal of radiology 199 1.6 2 Lancet London England 567 4.7 17 Endocrinology 194 1.6 3 American journal of surgery 523 4.3 18 Surgery 193 1.6 4 The British journal of surgery 454 3.7 19 British Medical Journal (Clinical Research Ed.) 192 1.6 5 Radiology 453 3.7 20 Surgery gynecology obstetrics 186 1.5 6 Journal of clinical pathology 376 3.1 21 Journal of the American college of surgeons 177 1.5 7 AJR. American journal of roentgenology 347 2.9 22 The journal of clinical endocrinology and metabolism 153 1.3 8 The new England journal of medicine 331 2.7 23 Plastic and reconstructive surgery 152 1.3 9 JAMA 330 2.7 24 British medical journal 145 1.2 10 American journal of clinical pathology 320 2.6 25 The journal of clinical investigation 126 1.0 11 Annals of surgery 295 2.4 26 Southern medical journal 118 1.0 12 The American journal of pathology 289 2.4 27 American journal of public health 106 0.9 13 Medicine 256 2.1 28 Annals of internal medicine 102 0.8 14 Archives of surgery Chicago ill 1960 232 1.9 29 The surgical clinics of north America 96 0.8 15 Archives of pathology laboratory medicine 199 1.6 30 The American journal of clinical nutrition 94 0.8 Result of thematic period from1987 to 2020 The following map shows the number of concepts related to the thematic area of breast cancer disease in three 11-year periods from 1988 to march 31st, 2020. The horizontal output arrow represents the number of concepts that entered the next period, the vertical output arrow shows the number of concepts that exited the period and were less important and the vertical input arrow reveals the number of concepts that received attention. In the second period, 852 new concepts and 690 concepts from the previous period entered the articles, of which 567 concepts entered the third period and 675 new concepts appeared (Fig 1). Fig 1: Thematic areas in three periods based on centrality and density In the first period, the highest centrality was found in Immunohistochemistry (IHC) and the highest density was shown Soybean themes. Fig 2: Concepts of the first period based on density and centrality from 1988 to 1998 In the second period, the highest centrality was found in the antineoplastic themes and the highest density was detected in isoflavones and Enzyme- Inhibitors themes. Fig 3: Concepts of Breast Cancer disease in the second period based on density and centrality from 1999 to 2009 In the third period, the highest centrality was found in corticosteroid, antineoplastic themes and the highest density was detected in vegetable and nuclear- protein themes. Fig 3: Concepts of Breast Cancer disease in the Third period based on density and centrality from 2010 to 2020 The strategic diagram of breast cancer is drawn on the basis of the abundance of articles in the four thematic areas including motor cluster, basic and transversal cluster, highly and developed cluster, and emerging and declining cluster. The most important topics are in motor cluster, which are displayed in ten-year periods, respectively. First period from1989 to 1998. In the first period in the upper-right quadrant (cluster motor), there are Transcription- Factors, Bone- Marrow-Cell, Immunohistochemistry and Fibroadenoma indicating the important role of these concepts in breast cancer disease field from 1989 to 1998 (Strategic Diagram of the First Diagram). Transcription factor is a protein that controls the rate of transcription by binding to a specific DNA sequence. One of the best ways of detection of these factors is using of Immunohistochemistry. Diagram 1: Strategic diagram of the first period (1988 to 1998) The second period from 1999 to 2009 The concepts of the motor theme are Isoflavines, Enzyme- Inhibitors, Immunohistochemistry, Estrogen, Proportional-Hazards-model, Steroid. Soy isoflavones are as enzyme inhibitor just like: lipoxygenase. Also, there is close relations between suppression of dendritic cells maturation and functions by isoflavones (phytoestrogen). Therefore soy-isoflavones can bind to estrogen receptors and act as an estrogen antagonist. Diagram 2: Strategic diagram of the second period from 1999 to 2009. The third period from 2010 to 2020 The concepts of the motor theme in the third period are revealing CORTICOSTEROID ANTINEOPLASTIC-AGE, Stem Cell, T-Lymphocyte, Protein- Tyrosine- Kinase, Dietary, and Phosphatidylinositol-3-Kiniase indicating the importance of these topics in this period. Steroids are important biodynamic agents and can use as a particular agent for receptor mediated diseases just like breast cancer. Also, there are relations between infiltrative T-lymphocytes in invasive breast cancer. Protein-tyrosine phosphatases have a crucial role in regulation of stem cells renewal and differentiation. In some researches we have been seen relations between DNA-binding protein oxidation and dietary supplements that contains plant extracts and vitamins. Diagram 3: Strategic diagram of the third period from 2010 to 2020

Conclusions:

As in this Scientometrics evaluation of breast cancer topic, a very long time period was considered for the collection of data and due to the multiplicity of the publications, this assessment was divided into three decades from 1989 to 2020. The results of the present study confirmed the progression of studies along recent decades and different concentrations of assessments in different years. Also, communications between these themes were shown. Fig4: Trend of Thematic Themes in Breast Cancer from1988 to 2020 Thematic themes in the three periods Using SciMAT(24) and Vos-viewer(41), the research output in the field was observed to revolve around 8 areas: In this figure, Right Themes were revealed Radiation injury, Cardiovascular disease, Fibroadenoma, Antineoplastic Agent, Esterogen-Antiagonistic, Immunohistochemistry, Soybean, Epitopes with different colors. Thematic links are demonstrated by a solid line. The size of the nodes represents the number of documents fitting to each theme. Also, the color of the nodes indicates different areas. As seen in Figure 4, the analyzed research output is categorized by a solid cohesion. Most of the identified topics have been gathered in thematic nodes. They arise from a topic appearing in the previous period. Also, they show a continuous evolution with almost no jumps or gaps. Regarding the starting period, seven thematic areas started in the first period (right). Thus, they could be considered as a primary subject in breast cancer. Furthermore, in the second period, a new thematic area also emerged: ethno group Proportional Hazards, corticosteroids, post-operative, ovarian neoplasm, ethnic group and cytokine. Indeed, the emerged thematic areas play an essential role in the development of the field. Regarding the theme composition, the thematic areas of “immunohistochemistry” are mainly composed of motor themes in all periods. Also, in the third period, ethnic evolved to Hispanic. It would be considered as a reason in this field. In addition, these topics such as stem cell, solid tumor, breast implant, echography, protein tyrosine kinas emerged in this field with some of them evolving from the second period. The relationship between IHC and cancer biology, which is now better known, has influenced ALND and axillary lymph node dissection. ALAD antibody is an unconjugated rabbit polyclonal antibody from human. Tumor biological factors are different in each tumor that tend to metastasize to visceral lymph nodes from the types of tumors that tend to metastasize to the visceral lymph nodes. With more knowledge and understanding of tumor biology, systemic therapy policies have changed. At present, the decision to initiate and prescribe chemotherapy (systemic therapy) is influenced by the patient and tumor biological factors as well as the patient's lymphatic status. For example, in some cases, a tumor is diagnosed by mammography and screening in the early stages and there is no lymphatic involvement, and decisions to continue adjuvant treatments play a key role depending on the biological factors of the tumor. Biological factors are among the factors affecting the decision to start neoadjuvant therapy. For example, +, triple-negative, and HER2NE tumors have a dermatological response to neoadjuvant therapy. In patients with SLNB-positive, tumor biological factors such as ER/PR/ were prognostic factors and it is of interest to know whether ALND changes the patient's cervix or not. According to a study by Amaros, axillary radiotherapy was comparable to axillary dissection for local axillary control and had even fewer side effects. In patients with T1 and T2 masses who had SLNB positive and received axillary radiotherapy, OS and DFS were similar to patients who underwent axillary dissection. Therefore, SLNB is currently recommended for many breast cancer patients. Currently, based on the IBCSG23-01 study, the NCCN guidelines recommend the only radiotherapy for patients with positive SLNB (micro metastasis), and no axillary dissection is required for these patients. Thus, SLNB has currently replaced ALND in many cases. The false-negative rate is low in cases where dual-agent is used and at least more than 2 SLNs are found in patients with clinical lymph nodes and who are N1. Lymph node biopsy can be performed in patients undergoing neoadjuvant therapy. However, dual-agent therapy is preferably used when finding at least 2 lymph nodes in patients with pre-neoadjuvant clinical lymph node N1. In AJCC staging, biomarkers such as HER2neu/PR/ER are recommended to be effective. Pathological analysis Nowadays the basis of breast cancer treatment is complete knowledge of its progression and biological factors (ER/PR /HER2neu). These factors affect the stage of the disease and also indicate the likelihood of tumor recurrence. They can also assistance how to respond to selected treatments.


 Citation

Please cite as:

SadatMosavi A, Tajedini O, Esmaeili O, Abolhassani Zadeh F, Khazaneha M

Emerging Trends and Thematic Evolution of Breast Cancer: Knowledge Mapping and Co-Word Analysis

JMIR Cancer 2021;7(4):e26691

DOI: 10.2196/26691

PMID: 34709188

PMCID: 8587182

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.