Visualizing the World's Scientific Output Profiles Using Big Data
Rex Chen
University Transition Program
Floor Location : S 232 F

Automated methods of analysis, modeling, and visualization of large-scale scientometric data are measures that enable the depiction of the state of world scientific development. In this study, the minimum span clustering and minimum spanning tree methods, which are known for their automation and rapid calculation time relative to other methods, were integrated to investigate the scientific output profiles and performance of 100 countries based on their publications in 1905 science journals in sixteen knowledge domains from 1994 to 2011. The clustering, or sorting, of these countries into twelve knowledge production groups was closely related to their geographic location, ethnicity, and GDP per capita, as scientists in geographically proximal countries usually have similar cultural backgrounds, share similar interests and strengths, and are able to interact more frequently, while the research and development investment of a country will influence its specializations in scientific research. The performance of each group for each of the sixteen knowledge domains was evaluated by their specialization index, or the degree of specialization of a country in a given field; relative impact, or the impact of a country on a scientific field based on the number of citations to papers published in that country; and the volume of publication. The value of relative impact was generally anti-correlated to the specialization index. A minimum spanning tree was constructed to visualize the scientific knowledge production network as well as inter-group and intra-group relations, and the diagram was constructed so as to preserve the distances of geographic location and profile similarity by minimizing a cost function. It was found that the Euro-American group overall had a high impact, large publication volume, and high GDP, placing it as the central hub of the scientific knowledge production network. In general, the results are consistent with the findings of previous studies that used a country profile index for diagnosing the scientific activity of a country, including findings that the scientific interaction between cities is inversely proportional to square of their distance, knowledge diffuses through ethnic networks, and research is dependent on GDP. However, this approach is more systematic and provides a more comprehensive view of the global scientific network. Additional work could be undertaken in order to better understand the evolution in the clustering of world scientific output profiles over time, or the change of clustering results by varying the datasets. It is of particular interest to further investigate the pattern of international collaboration within a knowledge production group or across groups. Such studies could provide a panoramic view of global scientific activities, and assist countries in examining their scientific output performance to adjust their policies and allocation of resources for future scientific development and economic growth.