Pakistani Medical Corpus 2024

About the Corpus

The Pakistani Medical Corpus 2024 (PMC 2024) is an extensive compilation of contemporary medical English language usage in Pakistan, collected from various real-life contexts. The corpus contains hundreds of millions of words from medical fields. This new resource is designed to support research and teaching on contemporary English usage in Pakistan. PMC 2024 is the result of a groundbreaking collaboration between the Department of English at Emerson University, Multan, and the Institute of Corpus Studies and Applications (ICSA) at Shanghai International Studies University (SISU), China. This international partnership not only highlights the global significance of the project but also underscores its innovative nature, making it a project of great interest and importance.

About the Written PMC2024

The Written PMC2024 represents a vast and meticulously curated collection of written texts encompassing various facets of medical English usage in Pakistan. This corpus is a repository of hundreds of millions of words sourced from diverse sources such as Pakistani medical journals, medical web blogs, case studies, e-health public information platforms, medical textbooks used in Pakistani medical institutions, and doctoral dissertations and theses by Pakistani students.

The compilation process for the Written PMC2024 was a meticulous one. A dedicated team of corpus students and researchers systematically gathered data from a wide range of sources, both printed publications and online resources. This rigorous and collaborative effort ensured the inclusion of thousands of authentic samples, spanning medical journals, dissertations, blogs, case studies, and other pertinent sources, making the corpus a reliable and comprehensive resource.

By integrating a wide range of materials, the Written PMC2024 offers a comprehensive view of how medical English is utilized across different contexts within Pakistan. It serves as an invaluable resource for linguistic research, educational curriculum development, and practical applications in medical communication and education. This corpus not only enriches the understanding of contemporary medical English but also contributes significantly to enhancing healthcare communication and education standards in Pakistan.

The Written PMC2024 is an ongoing and dynamic project that continues to evolve. For the latest updates and detailed information on the progress of this initiative, please visit the Emerson University, Multan website. There, you can explore comprehensive insights into the development and milestones achieved in compiling this extensive corpus of medical English usage in Pakistan.

Corpus Statistics

Category Total Journals Available Journals Downloaded Journals Total Articles Downloaded Total Storage Size
Dentistry 6 6 6 2769 1.05 GB
Health Professions 51 44 44 14490 14.7 GB
Veterinary Science 5 5 5 1882 1.03 GB
Total 62 55 55 19141 16.8 GB

Supplementary Datasets

Type Number of Sources Number of Entries
Medical Case Studies 23 756
Pakistani Medical Blogs 27 1709
PhD Dissertations 10 354