Widespread Sexual Dimorphism in the Transcriptome of Human Airway Epithelium in Response to Smoking
Henry Shi
Collingwood School
Floor Location : S 043 D

The smoking epidemic is one of the ​biggest public health issues in modern history. Smoking is a major risk factor for chronic diseases such as chronic obstructive pulmonary disease (COPD), which affects more than 300 million people and is the 3rd leading cause of death worldwide. While historically, men used to smoke more, there has been a significant rise in the prevalence of smoking among women ever since the 1960’s. Studies have also shown that for the same degree of smoking, female smokers are at increased risk of COPD, particularly following menopause. ​F​or the same severity of airflow obstruction, female patients have increased shortness of breath, poorer quality of life, and greater functional impairment than male patients. Furthermore, women with COPD are more likely to experience lung attacks mostly due to viral or bacterial infection. The reasons behind this sexual dimorphism in COPD are not well understood. Therefore, as a student in a biostatistics research lab at the Centre for Heart Lung Innovation, I investigated the genetics behind this sexual dimorphism. I normalized and quality-checked public gene expression data of epithelial cell samples in the R statistical computing environment (version 3.5.0) using the Robust Multi-array Average (RMA) method. I then used a linear regression model to identify gene expression changes associated with smoking status and sex. I also tested whether smoking affected gene expression differently in males and females (sex-by-smoking interaction). The analyses were adjusted for age, ethnicity and pack-years of smoking. The Benjamini-Hochberg method was utilized to control the false discovery rate (FDR) and to correct for multiple hypotheses testing. The samples were split into discovery and validation sets. Differentially expressed genes were considered replicated if both the discovery and validation sets found them to be significantly so between smoker and non-smokers, males and females, and sex in response to smoking, and with the same direction of effect. After getting thousands of replicated genes, I investigated the biology of the most differentially expressed genes (lowest P-value). Further work in the future will be conducted on uncovering the biological processes in which these genes are involved and how these differences can reflect the different disease risks between the sexes, ultimately hopefully helping clinicians in diagnosing or curing patients better.