All data included in this analysis were obtained from the UK Biobank. The UK Biobank is a large cohort of 502,507 participants recruited between 2006 and 2010 from 22 assessment centers across the UK . All participants completed a written informed consent form, a self-completed touchscreen questionnaire, a short computer-assisted interview, and physical measures. Meanwhile, biological samples, including blood, were collected from different centers through strict quality control during the baseline period .
In this study, we excluded 46,533 subjects with cancer at baseline, 30,035 subjects with no CRP information, and 4,975 subjects with no genetic data. Finally, a total of 420,964 participants were included in this study.
Assessment of exposure, outcome and covariates
CRP concentration was measured by high sensitivity immunoturbidimetric analysis on a Beckman Coulter AU5800 at baseline in a range of 0.08 to 79.96 mg/L. Outliers were limited by the 1st percentile (Q1) or 99th percentile (Q99) of CRP levels. A particular detail of the collection and processing of blood samples has been described elsewhere .
Cancer outcomes were defined based on ICD10 coding and sourced from the National Cancer Registry. Follow-up time referred to the period from initial admission to the first cancer diagnosis, registration of cancer or loss, or end of follow-up (31 October 2015 for Scotland and 31 March 2016 for England and Wales). Finally, after excluding site-specific cancers with fewer than 100 emergent cases, we included all-causes and 21 site-specific cancers in this study (Supplementary File 2: Table S1).
Variables that might affect the association between CRP and cancer risk based on previous studies were considered as covariates in our analysis, including age, family history of cancer, body mass index (BMI), height, smoking status, alcohol consumption, and physical activity for both males and female, as well as menopause, oral contraceptives and hormone replacement therapy for women . We also included gender, race, education, Townsend Deprivation Index, and assessment center as covariates. These covariates were collected using a touchscreen questionnaire or measured by trained personnel at baseline, and no covariates had more than 2.0% missing values (Supplementary File 2: Table S2). The missing values for continued covariates were replaced with the sex mean of each variable. And missing values for categorical covariates were considered an “unknown” category.
Genome-wide genotyping was performed using the Affymetrix UK BiLEVE Axiom Array or the Affymetrix UK Biobank Axiom Array. The two arrays share 95% of the markers. Imputation was performed using SHAPEIT3 and IMPUTE3 based on fused UK10K and 1000 Genomes Phase3 panels . Markers with a minor allele frequency > 0.001 and an info score > 0.3 were stored in the UK Biobank. Detailed information on genotype quality, quality control and genotype imputation has been described in previous studies .
Genetic tool for serum CRP levels
A total of 52 susceptibility loci associated with serum CRP concentration were identified in a previous GWAS , which was used to construct the CRP genetic instrument by calculating the weighted genetic risk score (wGRS). The genetic instrument was strongly related to serum CRP concentration f Statistics from 216 and could explain 2.6% of the CRP variance in this study (Supplementary File 1). In addition, five SNPs associated with both colorectal cancer and serum CRP concentration were further excluded in the sensitivity analysis to assess the validity of the instruments (Table S3 in Supplementary File 2).
Cox proportional hazards regression was performed to assess the association between CRP and cancer risk. Schoenfeld residuals and log-log inspection were used to test the proportional risk assumption. The time scale in the Cox PH regression was from enrollment to the time of cancer diagnosis, death, withdrawal from the study, or the end of follow-up, whichever came first. We estimated the hazard ratio (HR) associated with CRP (per 1 mg/L increase) for each site-specific cancer in all eligible participants and reassessed the HRs by grouping participants into low CRP levels (≤ 3 mg/L ) and high graded CRP levels (>3 mg/L) . We also applied a constrained cubic spline analysis to examine the possible nonlinear forms of association between serum CRP concentration and cancer risk. To balance the best fit and overfit in the splines for Krebs, the number of nodes from three to five was tested and we chose the one with the lowest Akaike Information Criterion (AIC) value; if the same AIC was observed for different nodes, the lowest number of nodes was chosen . With the exception of lung cancer (4 nodes at the 5th, 35th, 65th, and 95th percentile of CRP), we fitted the models for all-cause and other site-specific cancers with 3 nodes at the 10th, 50th, and 90th percentile of CRP. We used a likelihood ratio test for the calculation P-Score for nonlinearity by comparing the model with only one linear term to the model with linear and cubic spline terms . We further performed subgroup analyzes to assess potential effect modification by age, gender, and smoking status using likelihood ratio tests. To examine the robustness of our results, we performed several sensitivity analyses: (1) re-analyzing the association between logarithmically transformed CRP levels and cancer risk, (2) excluding or only including participants who had cancer within the first two consecutive years was diagnosed. until avoiding the possible reverse causality, (3) excluding participants with a CRP value > 10 mg/l to avoid the effects of an acute severe infection, (4) additionally adjusted for cardiovascular diseases and diabetes and (5) additionally adjusted for regular use of aspirin and ibuprofen.
The potential linear and non-linear causal relationships between CRP concentration and cancer risk were simultaneously assessed in this study. To assess the potential linear associations, we performed a two-stage MR analysis. In the first stage we estimated the fitted values using a regression of CRP versus wGRS and in the second stage the predicted value was further fitted in a Cox regression model with cancer risk. Covariates, including age at baseline, gender, and the 10 major genetic components, were adjusted for in both phases. In addition, several sensitivity analyzes were also performed in the analysis: (1) we reestimated the causal relationships between log-transformed CRP levels and cancer risks, (2) two-stage MR was only performed in participants of British ancestry, and (3) rs2794520, the strongest SNP in previous GWAS, used as an instrument variable to minimize the possibility of introducing horizontal pleiotropy .
For the non-linear MR analysis, the sample was stratified into three layers according to the residual CRP (the CRP minus the genetically predicted CRP). Next, we assessed the exposure-outcome associations using the piecewise linear method within each layer by contributing a line piece whose gradient is the LACE . Two tests for nonlinear hypotheses were then applied: (1) a heterogeneity test using Cochran’s Q-statistic to analyze the difference between the LACE estimates, and (2) a trend test, which is a meta-regression of the LACE estimates versus the mean value of the CRP in each stratum.
All analyzes were performed with R (version 3.6.0), two-sided P a value of <0.05 was considered statistically significant. To avoid the inflation of false positives, we calculated the adjusted false detection rate (FDR). P values in the main analyses. Linear and nonlinear MR analyzes were performed using the MendelianRandomization and nlmr packages, respectively.