Study of Genetic Diversity of Sheep Breeds in Afghanistan Using SNP Markers

The objective of present study is analyzing genetic diversity among three Arab, Baloch and Gadic breeds using selected markers. Mutual comparisons of each two breeds were conducted to detect and accurately analyze differences between breeds. . 45 blood samples were collected from three districts of Herat province (Shindand, Gulran and Obe) of three Afghan sheep breeds (Arabi, Baloch and Gadic). 10 μL of blood was collected via the jugular vein in Venoject tubes with EDTA (Ethylene Diamine Tetraacetic Acid) for prevention of blood coagulation and immediately stored in a refrigerator at 4 °C. DNA was extracted from blood using the GenEluteTM Blood Genomic DNA Kit. DNA concentration was determined using NanoDrop (Spectrophotometer ND-1000). In this research haplotypic blocks analysis in experimental regions, the way of their erosions and LD graphs are drawn using Haploview v4.2 software. Required information as inputs for this software consisted of genotypic information of markers in experimental regions similarly; the statistics that are used for LD calculation are the same correlational coefficients between r2 and surrounding SNPs. A total of 15 Arabi, 15 Baloch and 15 Gadic sheep breed were genotyped at 53862 SNP loci with the Ovine SNP chip50K Bead chip (http://www.illumina.com). usually those SNP that had been assigned to the 26 autosomes and X chromosome was measured) Then for each SNP, minor allele frequency (MAF) (over all animals) less than 2% were removed and percentage of calls rate ? 95% (how many sheep the marker worked for) was removed (Teo YY, Fry AE, Clark TG, Tai ES, Seielstad M: On the usage of HWE for identifying genotyping errors. Annals of Human Genetics 2007, 71:701-703). Biodiversity is usually described in terms of three intimately connected levels, namely Species diversity, Genetic diversity, Ecosystem diversity. Considering excess of heterozygosity within studied breeds, one can conclude that these breeds are not threatening in terms of heterozygosity decline and can be considered as an appropriate genetic reserve for different husbandry and eugenic purposes in Afghanistan. Furthermore high heterozygosity in studied chromosomes in Arab, Baloch and Gadic breeds suggest high diversity within population in spite of carrying out eugenic activities on livestock due to managerial plans which has managed to reduce the consistency level and keep the diversity in acceptable level.

Genetic diversity plays an important role in the lives of most species that live longer.Diversity in genetics occurs at the molecular level as it is the key to the development of past, present and future of agriculture and animals, so it's very important to know the information about the population of animals farm and their genetics in animal breeding (Esmail-Khanian et al., 2007).
Genetic diversity refers to the diversity of genes within single species.Other genetic diversities can occur in random mutation at the molecular level.
Genetic diversity is the variation of heritable characteristics in the same species population.It plays a significant role in evolution by allowing a species to adapt to a new environment and to stand against parasites.
Genetic diversity is essential for the sustainability of livestock (and other) species for a variety of reasons: Genetic diversity within the breed for long-term genetic improvement of livestock breeds, and the election of new features or attributes in a changing environment.Also to avoid inbreeding is important because of low performance.

Genetic diversity between breeds
Local breeds for the support and maintenance of genetic diversity in animals with high performance are used.The local breeds have specific social and economic value; these animals have good adaptation with the toughest conditions.Furthermore the local breed is one of our cultural heritages (Gandini & Villa, 2003).
The genetic diversity found in livestock allows livestock keepers to help the livestock to resist disease, environmental change, marketing of livestock, etc.Maybe it is impossible to predict.Most of local breeds are rare today due to the loss of products and lack of good market.The Finn sheep, for example, was cast aside by commercial breeders decades ago and kept only by Finnish peasants.

Molecular markers and marker assisted selection
Molecular markers are useful and accurate tools that can substitute the traditional and classical genetic techniques for amelioration of eugenic programs and differentiation within the breeds.Molecular markers can be better alternative sources of information for estimating genetic diversity, in the case of missing dynasty data or pedigree errors.Indeed when the information of pedigree is available for markers, they may allow estimating the genetic diversity more exactly.
Genetic markers are differentiators between the DNA of each chromosome that is transferred from a parent to the descendants.Mostly when they are used between individuals, populations, species, breeds… they are called genetic markers that differentiate and distinguish them from one another.A genetic marker requires polymorphism (variation) and the heritability.In the past, genetic diversity studies such as allozymes, were studied on the base of protein variants in enzymes and because of low number of loci and polymorphism level the other markers have taken over.
SNP (Single Nucleotide Polymorphism) markers, for assessing genome-wide genetic variation, provide new possibilities for genetic diversity and selection in QTL analysis.Studies of SNP markers are now using SNPs in genomic selection in livestock breeds (Zenger et al., 2007;Muir et. al., 2008;Kijas et. al., 2009).
They are series of DNA which are connected to the genes that lie under a quantitative trait.Mapping regions of the genomes that include genes which are classified as quantitative characteristics are done using molecular tags as AFLP or mostly as SNPs.
This is an early-used stage in identifying and sequencing the actual genes that lie under the characteristic variation.Quantitative characteristics refer to phenotypes that vary in degree and can be ascribe to polygenic effects.

Advantages of genetic diversity estimation with SNP markers
In addition to pedigree information, SNP markers help us to realize what the DNA is.Using SNP markers gives more information than pedigree cCharts, and the information is more accurate.Therefore, if pedigree information is missing, SNP markers can provide more data.Combining Pedigree and SNP data is a good way to estimate genetic diversity.We can also use SNP to see genetic diversity at the genome level.It allows us to identify low and high regions of genome diversity .If low regions have been identified, they can be conserved (Vanraden, 2007).Because low regions can be easily under the study, research and control.
is the total gene diversity or expected heterozygosity in the population is within population gene diversity or average observed heterozygosity in a group of communities.
is the average of expected heterozygosity in each subpopulation in the range of 0 to 0.05 indicate less genetic diversity, in range of 0.05 to 0.25 indicates more genetic diversity.
F indices make the analysis of subpopulation possible.These indices can be used to measure the genetic distances between populations.With the assumption of the subpopulation which had matting has different allele frequencies from total population ones (Krap et al 1998).

Data collection and DNA extraction
45 blood samples were collected from three Afghan sheep breeds (Arabi, Baloch and Gadic) from three districts of Herat province (Shindand, Gulran and Obe) .10 µL of blood was collected via jugular vein in Venoject tubes with EDTA (Ethylene Diamine Tetraacetic Acid) for prevention of coagulation blood was collected immediately stored at 4 °C in the refrigerator.DNA was extracted from blood using the GenElute™ Blood Genomic DNA Kit.DNA concentration was determined using NanoDrop (Spectrophotometer ND-1000).

Genotyping using Ovine 50K SNP Chip and data mining
A total of 15 Arabi, 15 Baloch and 15 Gadic sheep breed were genotyped at 53862 SNP loci with the Ovine SNP chip50K Bead chip (http:// www.illumina.com).usually those SNP that had been assigned to the 26 autosomes and X chromosome was measured).Then for each SNP, minor allele frequency (MAF) (over all animals) less than 2% were removed and percentage of calls rate d" 95% (how many sheep the marker hold true) was removed (Teo YY, Fry AE, Clark TG, Tai ES, Seielstad M: On the usage of HWE for identifying genotyping errors.Annals of Human Genetics 2007, 71:701-703).
For the remaining SNPs outlier departure from Hardy-Weinberg equilibrium (p < 10-2) over all animals of a breed were used for identifying genotyping errors (Teo YY, Fry AE, Clark TG, Tai ES, Seielstad M: On the usage of HWE for identifying genotyping errors.Annals of Human Genetics 2007, 71:701-703).After editing the data, 47326 markers for Arabi vs Baloch, 46284 marker for Arabi vs Gadic and 47159 for Baloch vs Gadic were retained for the study.Missing data were replaced with the most frequent allele at that specific locus.Allele frequencies and observed and expected heterozygosity were calculated for each breed.

Statistical analysis based on LD and haplotypical length
One very productive way for recognizing selections done in genome level is to utilize analysis based on LD.Because selection for a beneficial allele is accompanied with selection of loci that are attached around.
Unlike analysis such as; F ST , methods based on LD depend on frequency and distance between SNPs because these analysis are multiple.
In this research the methods of haplotypic blocks analysis in experimental regions, the way of their erosions and LD graphs are drawn using Haploview v4.2 software.Inputs used in this software consists genotypic information of markers in experimental regions and statistics that are used for LD calculation are the same correlational coefficients between r 2 and surrounding SNPs.

Study of homozygosity and heterozygosity in different breeds
A way for recognition of breeds that are aimed for selection is the homozygosity comparison over the genomic region.Considering two distinct breeds, selection can be done in two ways; firstly when a beneficial mutation is only selected in one of the breeds while selection isn't aimed for the other breed.It is expected to show one of those two above-mentioned homozygosity breeds in one genomic region while the other breed isn't under such a consideration.
Secondly if different alleles of one mutation in the considered breeds are selected in two directions, it means selection is done for one of the alleles in each breed, so that it is expected to show both above-mentioned homozygosity breeds.Within this research homozygosity for each SNP marker is first determined by valuing 1 for homozygote and 0 for heterozygote markers in order and then average length of homozygosity is calculated for each SNP considering near-bordered SNPs in microsoft Excel 2010 and related plot is drawn on two sides of candidate genomic region in each breed.Thus in these graphs values are indicator of average length of homozygosity for each SNP considering at homozygosity in nearbordered SNPs.This analysis is the same as study of linkage disequlibrium (LD) in the region that in which homozygosity is indicator of the selection in the same genomic region.

RESULTS AND DISCUSSION
90 animals consisting 30 or 15 samples from each Baloch, Arabic and Gadic breed have been genotyped through Ovine Bead Chip arrays.
After primary control of genotyping data , two animals (one from Arab breed and one from Gadic breed) are eliminated from subsequent analysis due to having more than 10% lost genotype and finally 99 animals are remained for subsequent steps.Different stages of SNP markers filtration are presented in Table 4-1.Finally, 47327, 47303 and 47307 SNP markers have managed to pass quality control stages in Baloch, Arab and Gadic breed, respectively.As, 2665, 2659 and 2758 SNP markers have been eliminated in Baloch, Arab and Gadic breeds respectively, due to MAF less than 0.02.Similarly, 3870, 3900 and 3797 SNPs have been eliminated in Baloch, Arab and Gadic breeds respectively, due to 0.05 obtained Genotype and finally 47327, 47303 and 47307 SNP markers are remained for subsequent analysis.Finally, SNPs which have passed through all quality control steps are kept for subsequent analysis.This information has been used for principal component analysis (PCA) analysis, population structure and LD structure.

PCA analysis, population structure, population differentiation index
Population structure of three sheep population of Afghanistan are examined using PCA analysis through information of samples genotypes by Admixture software.PCA analysis results showed that the studied population can be found in quite distinct groups based on PC1 and PC2 information and only one animal belong to Gadic breed has stood away from its own breed group, but the same breed has no overlap with other breeds too and probably they are some half breed from studied populations.This sample is eliminated from subsequent analysis.Due to scarcity of remained animals by elimination of one animal from Gadic breed 44 animals remained for subsequent analysis is taken into account.Considering PC2, Arab, Baloch and Gadic populations are separated from each other.In another study in species such as car, sheep and pig animals are classified only based on special vectors I and II and according to breed and geographical region (Gibbs, et al., 2009., Yang et al., 2012).By the same population diversity one can hope to find some points from genome which are selected in a significant manner.Moreover, due to natural selection during adaption, a set of selection is taken place on these species.This gives rise to population differentiation.Adaptation with local environments and artificial selection can alter allele frequency on genome special positions.In fact, frequency of decent alleles increases in selected position and it results in population differentiation numerical value (FST) higher than expected (Akey et al, 2002).Obtained population differentiation for studied populations shows low and middle population differentiation.Results of population differentiation analysis confirm presence of three distinct populations or breed.

Statistics related to SNP markers and their distribution on Afghani sheep chromosomes
Totally, 47327 SNP marker in Baloch-Arabic race comparison passed through the quality control procedures.Regarding this amount of In this study, in order to identify areas of genome located in Baloch, Arabic and Gadic race comparisons taken under various selection procedures, the right F ST genomic distribution graph were modeled via Weir & Cockerman method (Theta coefficient) for all the SNPs on the genome.One of the assessment problems via applying F ST method include that the sampling error is not take into account; and this problem was solved by Weir & Cockerman.The statistic ranges is fluctuated between 0-1 like the Wright method.However, respecting the non-skewed estimator, there is a possibility to obtain negative values (AKY ET al.2002).The advantage of this method over the basic F ST method proposed by Wright is the samples' sizes included in formula in the Theta non-skewed method, in fact, considered as the real sample error (Weir & Cockerman.2009).Typically, the Theta non-skewed method is used with low quantity and population comparison with various sizes (Akay ET al.2002).
In this section, in order to review the  population distinction of Afghan races, results from the non-skewed F ST estimator were applied through the Weir & Cockerman method (Theta coefficient) are reported and in all comparisons, The originated results with Wright F ST own high correlational coefficient (higher than 99%), which for example the correlation of two methods in Baloch-Gadic race comparison is shown in figure 2.

Heterozygosity degree
Heterozygosity is the most important method for measuring genetic diversity which can be obtained from total in situ heterozygosity frequency.Figure 3 shows heterozygosity of Arab, Baloch and Gadic breeds in each autosomic chromosomes.

CONCLUSION
In this study, analysis of selection markers search is taken play as mutual comparisons.Mutual comparisons are conducted for detecting differences between breeds and accurately analyzing these differences.Then, identifying selected regions is carried out using Theta statistic.In this paper, some parts of genome is determined as selection markers which in previous studies have been determined as selection markers of Human, cow and livestock.By identifying genes in vicinity of regions and selected SNPs, biologic role of a set of these genes are not identified thoroughly, similarly, for some genes it is probability of mutual influences which are unknown.In general, for accurately identifying role of these genes, one should carry out continuous studies with more performance.
Considering attempt to record from pure livestock of Arab, Baloch and Gadic breeds within collecting Afghani races.However, due to adjacency with keeping place of these breeds with other breeds, it is expected that some undesired mixtures take place.Therefore for examining the manner in which different sample arrange in breed groups, the PCA analysis is used.Results show that except sample livestock 2, all livestock are found in related breeds.One of these animals belonged to Arab breed which has been recorded in gender comparison and by examining information genotype determination; genders of these samples were not matching.Therefore, this sample has been eliminated from final analysis.Another sample was from Gadic breed, similarly this livestock samples has been eliminated from subsequent examinations.
Considering excess of heterozygosity within studied breeds, one can conclude that these breeds are not threatening in terms of heterozygosity decline and can be considered as an appropriate genetic reserve for different husbandry and eugenic purposes in Afghanistan.Furthermore high heterozygosity in studied chromosomes in Arab, Baloch and Gadic races suggests high diversity within population in spite of carrying out eugenic activities on livestock due to managerial plans which has managed to reduce the consistency level and keep the diversity in acceptable level.Suggestions 1.
By whole sheep genome sequencing within each three breeds or at least in some regions of genome which in current paper are identified as selection markers, in future studies one can examine LD structure based statistic with higher density of SNP markers and higher accuracy.

2.
Using other world breeds genomes together with studies breeds and together analyzing population differentiation can be effective in finding selection markers and population structures.

3.
Population genetic structures can be studied more deeply with more detailed on studied populations.
Statistical Analysis and Calculation of Population Differentiation between Different Breed Pairs: Total allelic frequencies for each locus, and considering all animals as a single population was calculated as: Where pop.1=number of individuals in population1 and pop.2=number of individuals in population2.Then, expected heterozygosity values in populations (Hs) and overall heterozygosity (Ht) were calculated.Finally, Fst was calculated according to Weir and Cockerham (1984): Fst = Ht -Hs / Ht After calculating Fst the wim5 Fst was calculate by using the average from of every 4 Fst and I deleted (two first Fst and two end Fst from every chromosome (-180 Fst from 27 chromosome) finally calculated Win5 Fst Manhattan Plot and Fst Manhattan Plot.

Fig. 1 .Fig. 2 .
Fig. 1.Fig. 2. Comparing Fst coefficient (based on Write method) and Theta coefficient (based on Vier and Cocerham method) for genotyped SNPs in the studies breed information

Table 1 .
Different filtration stages of data originated from genotype determination through various racial comparisons

Table 2 :
Properties of SNPs used for comparing Arab and Baloch breeds and their genetic distance on different chromosomes

Table 3 .
Properties of used SNPs in analysis related to breed comparing Arab-Gadic breeds and distance between them on different chromosomes change it like previous table

Table 4 .
Properties of used SNPs in analysis related to breed comparing Baloch-Gadic breeds and distance between them on different chromosomes change it like previous table