Population Selection and Qualification of Study of Haplotype Sharing in Finland
Most rare variants that were not suspected as causes of natural selection might be causes of some diseases. If the variants could be inherited from generation to generation, they would be signatures of population history. Finland is a convenient example which has signatures of population history. This population which dense and permanent region has been found that they carried several heritage diseases such as familial hypercholesterolemia which are rare diseases in other populations. These heritage diseases were the direct results of founder effects and genetic isolation. To study spreading of these phenotypes, haplotype-based methods were employed in this study.
From archaeological evidence demonstrates that there are multiple migration events in Scandinavia. Researchers decided to collect information from geographically or linguistically neighboring countries including Sweden, Estonia, Russia, and Hungary. They integrate biobank-scale genetic and detailed birth-record data. They accumulated data from 43,254 Finnish individuals (~0.8% of Finland’s total population) and16,060 from geographically or linguistically neighboring countries.
Exome sequencing data was qualified by removing of any individuals that contain > 10% missingness at sites where allele frequency is more than 0.001 and missingness is less than 10%. They collected genotype which provides genotype quality more than 20, depth more than 10X, allele balance > 0.2. Mover, allele frequency has been used as a predictive value of pathogens, so they kept SNPs with minor-allele frequencey (MAF) > 0.05. Then, they excluded haplotypes that fall partially or fully within parts of chromosome including telomeres, centromeres, acrocentric short chromosomal arms, heterochromatic regions, and contain gaps DNA.
Strength and Weakness
This study design better perform multi-generational continuous migration trends when compare to previous study, but as they mentioned in an introduction this method has been theoretically recognized when sample sizes were relatively small. Therefore, it not support modern genomics era.