Computation of genome-wide LD scores and matrices from the SG100K resource
Main Applicant – Dr Li Jingmei, Group Leader, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)

Linkage disequilibrium score regression (LDSC) is a technique in genetics used to understand how different factors, including genetic effects and confounding elements like population differences, contribute to the results of genome-wide association studies (GWAS). Essentially, LDSC uses regression analysis to examine the relationship between linkage disequilibrium (LD) scores and statistical results from GWAS. An LD score for a single-nucleotide polymorphism (SNP) is the sum of how much this SNP is linked to other SNPs.

LDSC helps estimate how much of a trait’s heritability (genetic contribution) can be attributed to SNPs, separate this heritability into different categories, and calculate genetic correlations between various traits. It is particularly effective with large datasets because it only needs summary statistics from the entire GWAS.

Taking reference from similar work performed by the Pan-UK Biobank, we will compute insample dosage-based LD matrices and scores for each of the three major ancestry groups in SG100K. With these LD resources, it is possible to re-analyze Asian GWAS results, including:

– LD score regression analysis to estimate heritabilities

– Fine-mapping analysis to identify causal variants of well-powered complex traits

The plan is to make the LD matrices available in Hail’s BlockMatrix format or similar. LD scores will also be available in LDSC-compatible flat files (.l2.ldscore.gz and .M_5_50).