Mapping of cis-acting expression quantitative trait loci in human scalp hair follicles

Background The association of molecular phenotypes, such as gene transcript levels, with human common genetic variation can help to improve our understanding of interindividual variability of tissue-specific gene regulation and its implications for disease. Methods With the aim to capture the spectrum of biological processes affected by regulatory common genetic variants (minor allele frequency ≥ 1%) in healthy hair follicles (HFs) from scalp tissue, we performed a genome-wide mapping of cis-acting expression quantitative trait loci (eQTLs) in plucked HFs, and applied these eQTLs to help further explain genomic findings for hair-related traits. Results We report 374 high-confidence eQTLs found in occipital scalp tissue, whose associated genes (eGenes) showed enrichments for metabolic, mitotic and immune processes, as well as responses to steroid hormones. We were able to replicate 68 of these associations in a smaller, independent dataset, in either frontal and/or occipital scalp tissue. Furthermore, we found three genomic regions overlapping reported genetic loci for hair shape and hair color. We found evidence to confirm the contributions of PADI3 to human variation in hair traits and suggest a novel potential candidate gene within known loci for androgenetic alopecia. Conclusions Our study shows that an array of basic cellular functions relevant for hair growth are genetically regulated within the HF, and can be applied to aid the interpretation of interindividual variability on hair traits, as well as genetic findings for common hair disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s12895-020-00113-y.


Background
Human hair traits show wide interindividual variability, which has been suggested to be largely determined by genetic factors [1,2].
Systematic gene identification efforts for a growing number of quantitative and complex human traits have shown that the majority of associated genetic factors are located in non-coding genomic regions [3]. These variants most probably exert their functional effects through the tissuespecific modulation of the expression of trait-relevant genes. Expression quantitative trait loci (eQTL) analyses, that correlate sequence variation with gene expression data, have proven a valuable tool in terms of delineating the tissue-specific architecture(s) of gene regulation and predicting the impact that trait-associated variants exert on it [4]. This approach therefore bridges the gap between a genetic association finding and the underlying biological mechanism, and may provide crucial insights into disease development.
Increased knowledge of the genetic factors that contribute to variability of gene regulation in the human scalp hair follicle (HF) will aid the interpretation of genetic findings for hair-related traits and hair loss disorders, such as androgenetic alopecia (AGA). Moreover, the comparison of the regulatory architecture between HFs from different scalp areas may aid the understanding of the variable susceptibility of hair follicle subpopulations to hormonal hair loss. The aims of the present study were to: 1) perform a systematic mapping of eQTLs in the human hair follicle, and 2) evaluate the potential of these eQTLs in terms of the functional annotation of genetic loci that contribute to the development of hair-related traits and common diseases.

Sample collection
About 50 HFs were plucked from the occipital scalp of 100 (discovery sample), and the frontal and occipital scalp of 25 (replication sample, previously described in [5]) unrelated male volunteers. Peripheral venous blood samples were collected from all study participants. All volunteers were German residents of European descent, and showed a collective mean age of 27.9 years.
Extraction of nucleic acids DNA was extracted from whole blood samples using the Chemagic Magnetic Separation Module I (Perkin Elmer Chemagen Technology Inc., Baesweiler, Germany). Total RNA was extracted from HFs using the RNeasy Micro Kit (Qiagen, Hilden, Germany), and the quality and quantity of the RNA were assessed using a BioAnalyzer 2100 (Agilent Technologies, Waldbronn, Germany), and a NanoDrop ND-1000 spectrophotometer (Peqlab Biotechnologie, Erlangen, Germany), respectively. Only total RNA samples with an RNA integrity number (RIN) ≥ 8 were further analyzed in the study.
Preparation of genotype data SNP array raw data was initially analyzed using the Genotyping module within the GenomeStudio software (Illumina). Genotype calls were exported for basic quality control in PLINK v1.9 ([6]; www.cog-genomics.org/plink/1. 9/) to eliminate bad quality data (e.g. SNPs and individuals with high degree of missing data, very rare SNPs) prior to genotype imputation. Imputation was performed on the Michigan Imputation Server [7] using the 1000 Genomes Project Phase 3 v5 reference panel and Eagle v2.3 phasing [8]. Post-imputation data processing was performed using VCFtools [9] and quality control was carried out in PLINK 1.9. Briefly, only biallelic single nucleotide variants with high imputation quality score (Rsq) > 0.7, minor allele frequency (MAF) ≥ 1% and under Hardy-Weinberg equilibrium (HWE p > 1 × 10 − 8 ) were further analyzed. A principal component analysis (PCA) was performed with PLINK to identify potential outlier samples and use the generated principal components (PCs) as covariates for eQTL analysis. Two final autosomal genotype datasets consisted of 5,887,234 SNPs and 98 individuals for the discovery sample, and 1,044,566 SNPs and 24 individuals for the replication sample.

Preparation of gene expression data
Raw data from the expression microarrays was initially analyzed using the Gene Expression module within the GenomeStudio software to generate calls and detection p-values. The probe-level gene expression data was exported for pre-processing by background correction, quantile normalization, log2 transformation, probe quality filtering and identification of potential outlier samples by PCA using R. Probes were considered expressed when showing a detection p-value < 0.01 in at least 5% of the samples. Probe quality filtering included the retention of only "good" and "perfect" quality probes mapping to only one gene with a valid identifier, according to annotations retrieved from the illuminaHumanv4.db package [10]. Three final gene expression datasets consisted of 13,217 probes and 98 individuals for the discovery sample, and 13,091 probes from frontal scalp and 12, 814 probes from occipital scalp and 24 individuals for the replication sample.

eQTL mapping
Genome-wide associations in cis (1 Mb window) between the expression levels in scalp HFs and SNP genotypes were tested using QTLtools [11]. Three covariate files prepared for the analyses included the first 10 (discovery sample) or 5 (replication sample) PCs for the genotype datasets and the first 10 or 5 PCs for the phenotype datasets. Initially, the full spectrum of eQTLs for each of all three gene expression datasets was identified through nominal pass analysis. After exploration of the nominal significant results (p < 0.05), only those eQTLs with false discovery rate (FDR) < 1 × 10 − 4 in the discovery sample were considered true eQTLs, while all eQTLs with p < 0.01 in the replication sample were retained for further analyses. To identify independent signals within our set of true eQTLs, a permutation pass analysis (1000 permutations), followed by a conditional pass analysis on grouped phenotypes (i.e. a gene-level output from the probe-level analysis) were applied.

Variant annotation
We explored reported eQTL effects for our independent HF eQTLs using the Variant Annotation tool from SNiPA (Single Nucleotide Polymorphisms Annotator) ( [12]; http://snipa.org), noting whether or not the HF eQTL has been previously reported to have cis-eQTL effects in at least one tissue, and whether or not eQTL effects have been observed on the same gene as in our study. Additionally, we searched for trait associations of our true HF eQTLs that have been reported in the GWAS Catalog ( [13]; https://www.ebi.ac.uk/gwas/).

Replication of HF eQTLs
The true eQTLs identified from the nominal pass analysis in the discovery sample were considered replicated when the same eQTL SNP (eSNP) was found associated to the same eGene, either through the same or a different probe, with p < 0.01 in either frontal and/or occipital scalp areas from the replication sample.

Differential eQTLs between frontal and occipital scalp
To investigate differential regulatory effects between frontal and occipital scalp areas, non-overlapping eQTLs with p-value < 5 × 10 − 5 were investigated in the replication sample. These included eSNPs that showed different effects between both datasets (i.e. association to a different eGene or opposing direction of the effect for the same SNP-gene pair; different-effect eQTLs), and those eQTLs that were unique to the frontal or occipital dataset (i.e. no overlaps in eSNPs or eGenes; region-specific eQTLs).

Functional enrichment analysis
The lists of eGenes obtained from the discovery sample and the differential eQTL analysis were submitted for enrichment analysis using the GENE2FUNC function of the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) platform ( [14]; http://fuma.ctglab.nl/). Each analysis used the provided option to include all human genes as background, a significance cutoff of FDR < 0.05 and a threshold of minimum overlapping genes with the gene-sets = 3. In addition, a comparative analysis of biological pathways for the differential eQTLs was performed using the lists of the identified region-specific eGenes and the Gene Enrichment Compare function within the FunRich (Functional Enrichment analysis tool) software [15]. From the resulting pathway analysis using the default FunRich database, only frontal-and occipital-specific terms with at least 2 genes from the dataset present in the pathway term, and p < 0.01 (from hypergeometric test) in one scalp area and p > 0.05 in the other, were retained for the purposes of the present study.

Overlaps with reported genetic findings for hair phenotypes
To test the informativeness of our HF eQTLs for the interpretation of genetic findings for hair phenotypes, we used three sets of published genome-wide significant variants (p < 5 × 10 − 8 ) associated with (i) hair shape (discovery meta-analysis, supplementary table 2 from [1]), (ii) hair color (meta-analysis, supplementary table 2 from [2]), and (iii) AGA (reported lead SNPs from [16][17][18][19][20][21][22][23][24][25][26]). These three variant sets were subjected to analysis by the SNP2GENE function of FUMA GWAS to assign GWAS findings to genomic regions. The analysis for each phenotype was set to include SNPs from the 1000 Genomes Project (Phase 3, European population) [27] that are in linkage disequilibrium (LD) with the reported GWAS variants. LD blocks were set to include variants of minor allele frequency ≥ 0.01, r 2 ≥ 0.6 and a distance < 500 kb for merging into a locus. Afterwards, we searched for overlaps between our true HF eQTLs and the resulting loci for hair shape, hair color and AGA.

Hair follicle eQTLs
We identified almost 3 million nominally significant eQTL signals in HFs from the occipital scalp (Fig. 1a). However, when we plotted the p-values and distances to the transcription start site (TSS) of the probe for each top SNPprobe pair with FDR < 0.05, we observed that a large number of SNPs were lying over 250 kb from the probe's TSS, suggesting many of these might actually be noise signals (false positives). For this reason, only eQTL-findings that showed an FDR < 1 × 10 − 4 were considered the likely true eQTLs (14,497 probe-SNP pairs) (Fig. 1b). From this set of true eQTLs, we identified a total of 374 independent HF eQTLs (Fig. 2a . Additionally, the annotation of reported eQTL effects with SNiPA showed that 266 of 374 independent eSNPs have been previously reported to have cis-eQTL effects in at least one other tissue, from which 228 were reported to affect the same eGene. Annotation of GWAS Catalog trait associations identified 13 independent HF eSNPs that are associated with 15 traits, including type 2 diabetes mellitus and chronic inflammatory diseases. From the set of true HF eQTLs, 185 eSNPs showed associations with 134 traits, with the highest number of associations found for body mass index (associated c Representative pathways enriched for the list of eGenes of true eQTLs. d SNP overlaps between the true hair follicle eQTLs (FDR < 1 × 10 − 4 ) and the relaxed frontal and occipital eQTLs (p < 0.01) datasets. e Gene overlaps between the true hair follicle eQTLs and the relaxed frontal and occipital eQTLs datasets with 29 eSNPs) and blood protein levels (associated with 13 eSNPs) (data not shown).
We achieved the replication of 68 independent signals ( Fig. 2) in either occipital and/or frontal scalp HFs from a total of 255 overlapping eQTLs (Suppl. Table.3 [Supplementary Tables]). In total, 188 and 157 eSNPs, and 264 and 276 eGenes, overlapped between our true eQTLs and the frontal and occipital HF eQTLs from the replication sample, respectively. Although only 92 eSNPs overlapped between all three eQTL datasets, these accounted for 249 eGenes (Fig. 1d and e); however, not all eQTL effects were consistent between the three datasets.

Differential HF eQTLs between frontal and occipital scalp areas
We included all eQTLs with p < 5 × 10 − 5 in the analysis of the replication sample. However, as the power to detect associations is markedly reduced due to the limited sample size (N = 24), this is reflected in the number and statistical significance of eQTLs that we detected in frontal and occipital scalp, as well as in their distributions with respect to the probe's TSS (Suppl. Figure.1 [Supplementary Figures]). In general, we found little overlap between these two datasets, and will limit ourselves here to briefly present an overview of regional differences between frontal and occipital scalp.
We identified 71 HF eQTLs with inconsistent effects between frontal and occipital HFs. These were considered "different-effect eQTLs" and affected 11 genes (Suppl. Ta Table 1. To identify differentiallyenriched pathways, we performed a separate FUMA pathway analysis for frontal and occipital differential eGenes (i.e. region-specific + different-effect) and compared the results for both scalp regions. This analysis suggested differences in the genetically-determined regulation of the responses to steroid hormones, cell cycle control, cellular metabolism and immune functions between the scalp regions (Suppl. Figure.2 [Supplementary figures]). However, as several of the region-specific pathway terms seemed redundant, we sought to further elucidate the differences between both scalp regions through a comparative pathway analysis using FunRich. This analysis suggested that the more important differences between frontal and occipital scalp HFs are on pathways related to the metabolism of amino acids and signaling by histone deacetylases (HDACs)  The top 20 hair follicle eQTLs obtained from human occipital scalp in the discovery sample and the top 10 hair follicle eQTLs obtained from frontal and occipital scalp in the replication sample are shown eQTLs expression quantitative trait loci, eSNP SNP with eQTL effect, Chr chromosome, A1 effect allele, A2 other allele, P p-value, FDR false discovery rate in the frontal scalp, and to proliferative processes in the occipital area (Table 2).

Overlaps with genomic regions associated with hair phenotypes
To illustrate the applicability of our set of true eQTLs to interpret GWAS findings for hair phenotypes, we investigated for a potential overlap of HF eSNPs with genomic risk loci for AGA (98 loci), hair shape (12 loci) and hair color (122 loci) ( Table 3). We identified one genomic locus at 1p36.13 that had previously been associated with hair phenotypes. This comprised 31 eSNPs overlapping association signals for hair shape (overlapping lead SNP: rs11203346) and hair color (overlapping lead SNP: rs72646785) (Fig. 3a). For this locus, the nearest genes were mapped to PADI3 and PADI4 in both GWAS, while the eGene for this region corresponded to PADI3. Moreover, another locus at 4q21.21, comprised by 26 eSNPs linked to the signal of rs6533756 (a non-eSNP in LD r 2 ≥ 0.8 with all eSNPs), was associated with hair shape. The overlapping eSNP most strongly associated with hair shape at this locus was rs7695038 (p = 3.9 × 10 − 10 ). While eSNPs at this locus locate within a region coding for the FRAS1 gene, the associated eGene, ANXA3, was located 66,182 bp downstream of FRAS1 (Fig. 3b). A third locus at 17q25.3 comprised the eSNP, rs8070929, which was in LD (r 2 ≥ 0.97) with rs34872037 (a non-eSNP) that is associated with hair color. While the nearest gene for the SNP at this locus was NPLOC4, the HF eGene corresponded to TSPAN10 (Fig. 3c). All eSNPs overlapping association signals for hair shape and/or hair color can be found in Suppl. Table.7. We found no overlaps of our true HF eQTL loci with the reported genome-wide significant findings for AGA. However, we observed that the eGene ATP2B4 was annotated as the nearest gene to an AGA genomic locus at 1q32.1. Moreover, it is perhaps worth to notice that, at the nominal significance level (p < 0.05), we found 93 eGenes to overlap with reported AGA candidate genes, of which 18 were associated with eSNPs at an FDR < 0.05 (ALPL, B3GNT8, BCL2, DIP2B, FAM136A, HDAC9, LCLAT1, LHPP, NSF, PRRX1, RPTN, RSPO2, SUCNR1, TCHH, TMEM50A, TWIST2, MSIG2 and SPAG17).

Discussion
The strongest eQTL associations in occipital HFs were observed for IPO8 (rs7326), ATP5MD (rs2271751), and C17orf97/LIAT1 (rs11150881). The findings for ATP5MD were confirmed in our small replication study. Although we were not able to replicate the "true" SNP-gene associations for IPO8 in our replication sample, a set of different SNPs were indeed associated with IPO8 expression in HFs from frontal and/or occipital scalp areas, some of which were also associated with IPO8 expression in the discovery Table 2 Best frontal-and occipital-specific biological pathways enriched for differential eGenes eQTLs expression quantitative trait loci, GeneSet number of total genes in pathway term, n number of genes from the study overlapping the pathway term, P p-vlaue sample at the nominal level (data not shown), therefore confirming the genetic regulation of IPO8 in HFs. While little is known about the function of C17orf97, IPO8 mediates the nuclear import of proteins and mature microRNAs [28]. ATP5MD is crucial for the maintenance of ATP synthase in mitochondria, and might actively participate in the cellular energy metabolism, a process with well-known relevance to hair biology and hair growth [29,30]. Overexpression of ATP5MD causes a number of mitochondrial abnormalities and an increase in anaerobic metabolism associated with the induction of an epithelial to mesenchymal-like transition, as well as delayed cell growth [31,32]. The genetically-controlled regulation of mitochondrial function in human HF is further supported by our pathway analysis that found an enrichment of eGenes for true eQTLs in pathways related to mitochondrial function along with other pathways, such as the regulation of responses to steroid hormones, the Wnt/β-catenin and interferon (IFN) signaling, adipogenesis, immune responses and the metabolism of glucose and lipids, all of which have well-known roles in HF biology [33]. We also investigated whether there might be differences in the genetic control of gene expression between HF subpopulations from different scalp areas. Despite the reduced size of this (replication) sample, our results point to interesting avenues for future research. For instance, an important difference between frontal and occipital scalp appears to be the metabolism of amino acids, including arginine degradation. It is known that the HF is dependent on arginine, as hair growth depends on the vasculature and L-arginine not only participates in cell proliferation but is a precursor for the vascular mediator nitric oxide. L-arginine deficiency has been shown to impair hair elongation, while its supplementation increases the number of HFs in anagen (growth) and decreases that of HFs in telogen (resting) phase [33,34]. This supports the notion that regional increases in arginine degradation might result in enhanced vulnerability to hair loss in the frontal scalp area. A similar scenario can be thought for HDACs. HDACs are important transcriptional repressors that act in multiprotein complexes and are involved in the control of cell cycle progression [35,36]. While class I HDACs are ubiquitously expressed, class II HDACs show tissue specificity [36]. In our study, we found eQTLs for HDACs 2, 5 and 7 only in frontal scalp. Interestingly, these particular HDACs modulate HF development and homeostasis [37,38], as well as angiogenesis and vascular integrity [39][40][41]. Moreover, androgen actions have been shown to regulate HDAC7 subcellular compartmentalization, and HDAC7 has been proposed to be a co-repressor of the androgen receptor [42].
We also found evidence for differential regulation of vitamin C (L-ascorbic acid) metabolism in frontal scalp. It has been shown that a derivative of L-ascorbic acid (L-ascorbic acid 2-phosphate) promotes HF growth that is mediated by the induced expression of insulin-like growth factor-1 (IGF-1) in dermal papilla cells [43]. This opens the possibility that decreased expression of genes involved in the metabolism of L-ascorbic acid in frontal  Hair follicle eQTLs overlapping genetic findings for hair-related traits. Linkage disequilibrium plots highlighting: a 31 HF eQTLs that overlapped a genomic locus shared by hair shape and hair color, b 26 HF eQTLs that overlapped a genomic locus for hair shape, with the most strongly associated SNP used as sentinel (in blue), and c one HF eQTL (blue) that overlapped a genomic locus for hair color. GWAS loci were defined through an analysis with the FUMA SNP2GENE function, applied separately to the genome-wide findings of GWAS for hair shape and hair color. The plots were generated using SNiPA. For the plot in a, rs72646785 (independent HF eQTL for the region) was used as the sentinel SNP (blue). The lead SNPs within this locus for hair shape (rs11203346) and hair color (rs72646785) are separated from each other by 2650 bp and have an LD r 2 = 0.99 in the 1000 Genomes Project, Phase 3 v5, European population reference panel growth factor signaling, transforming growth factor beta receptor), considering what Bernard has referred to as "the Yin Yang of the human hair follicle" [33]. Nevertheless, the implications of genetic regulation in frontal scalp for hair loss disorders should be investigated through the generation of a confident eQTL dataset with increased power in future studies. With our study, we also show that tissue-specific eQTL data are a valuable resource to identify regulatory effects at disease-and trait-associated loci. In particular, our results suggest an important role of PADI3 in hair traits. Indeed, PADI3 is located in the inner root sheath and medulla in anagen HFs and has been reported to play roles in HF differentiation [44] and hair shaft formation [45]. Our results also implicate ANXA3 and TSPAN10 as novel candidate genes for hair shape and color, respectively. Although we found no overlap between the true HF eQTLs and AGA genetic risk loci, which might be an expected finding, considering that the occipital scalp is not susceptible to balding, the overlapping eGene ATP2B4 provides a potential novel candidate gene for AGA, as the reported gene for the region is so far SOX13 [25,26].
The most obvious limitation of our study is the sample size. However, we tried to overcome this limitation by applying high quality standards to the data and stringent selection criteria to our HF eQTL results. Another potential limitation of our study resides in the use of plucked HFs instead of intact HFs. However, a skin biopsy is required in order to obtain intact HFs, whereas hair plucking is a less invasive technique. Moreover, it has been demonstrated that plucked hairs retain most epithelial structures, maintain the integrity of the outer root sheet and also contain stem cells [46]. Finally, due to the small size of our samples, particularly that of the replication sample, we considered advisable to exclude from the present study indels and sex chromosomes, and limit ourselves to reporting, in very general terms, differential findings between HFs from frontal and occipital scalp areas.

Conclusions
Our analyses demonstrate that well-established HF molecular pathways are genetically regulated and that, to some extent, this regulation can show regional specificity within scalp HFs. The enriched pathways mainly underscored processes that relate to hair growth and the HF cycle. More detailed tissuespecific analyses will be enabled by future increases in sample size, ensuring an improved understanding of the genetically determined variability in HF gene expression and its implications for hair-related traits and hair loss disorders.