Diversity analysis based on the STR dataset
A total of 18 STR loci were used to genotype 107 Kazakh Tobet dogs from four populations (Table 1), resulting in a total number of 193 alleles. Pop1 had the highest average number of alleles per locus (Na = 10.22 ± 0.50). In comparison, Pop2 and Pop3 had moderate genetic diversity, with average Na values of 7.00 ± 0.34 and 6.11 ± 0.25, respectively. Pop4 had the lowest genetic diversity among the four populations, with an average Na value of 6.00 ± 0.33. The loci with the highest number of alleles were REN162C04, AHT137, AHTh260, FH2054, AHT121 and AHTh171, each of which had between 12 and 15 alleles. The lowest number of alleles was observed for locus AHTk211, which had 6 alleles. The average number of effective alleles for all analysed Kazakh Tobet dogs was 5.47 ± 0.32 and ranged from 4.14 ± 0.28 in Pop4 to 5.42 ± 0.34 in Pop1. Pop2 had the highest observed heterozygosity (Ho = 0.79 ± 0.03), closely followed by Pop1 (Ho = 0.78 ± 0.03) and Pop3 (Ho = 0.76 ± 0.04). Pop3 had the highest expected heterozygosity (uHe = 0.83 ± 0.01), while Pop1 (uHe = 0.81 ± 0.01) and Pop2 (uHe = 0.80 ± 0.01) also showed considerable but slightly lower heterozygosity. In contrast, Pop4 had the lowest heterozygosity measures, with Ho at 0.77 ± 0.05 and uHe at 0.77 ± 0.03. The fixation index F was not significantly negative in Pop2 (uF=-0.02 ± 0.03, 95% CI: -0.07–0.04) and Pop4 (uF=-0.05 ± 0.09, 95% CI: -0.22–0.13), indicating an excess of heterozygotes in these populations. Conversely, non-significant positive F-values were observed in Pop1 (uF = 0.03 ± 0.01, 95% CI: -0.19–0.25) and Pop3 (uF = 0.02 ± 0.07, 95% CI: -0.12–0.16), indicating a slight lack of heterozygotes. A statistically significant absence of inbreeding was demonstrated for all analyzed Kazakh Tobet dogs (uF = 0.03 ± 0.01, 95% CI: 0.02–0.05).
The Hardy-Weinberg equilibrium (HWE) test, which was performed separately for each locus for all dogs, showed no significant deviation from the expected frequencies (P > 0.05), with the exception of the loci INRA21, AHT137, AHTh260, AHTk253, FH2054, REN162C04 and AHTh171, which each showed a significant deviation with P < 0.001 (Table 2).
The PCoA of the STR data for the four populations revealed three significant axes of genetic variation: axis one accounted for 4.68%, axis two for 9.06% and axis three for 13.10% of the total variation (Fig. 2). The analysis showed that all four populations were admixed.
PCoA plot of 107 Kazakh Tobet dogs from four populations based on STR data.
To further elucidate the genetic variation within the Kazakh Tobet breed, a STRUCTURE analysis was performed (Fig. 3). The ΔK method revealed that K = 7 is the optimal number of genetic clusters representing the most genetically similar groups (Fig. 3a), suggesting that seven distinct gene pools form the genetic architecture of Kazakh Tobet dogs (Fig. 3b). In addition, the second highest ΔK value at K = 4 was much larger than the other values, indicating another significant clustering pattern. Remarkably, at K = 7 all genetic clusters were present in all four populations (Fig. 3c).

Genetic structure of the Kazakh Tobet dogs. Bayesian clustering on the STR dataset of 107 dogs performed with STRUCTURE v2.3.4 after correction by Evanno et al.29. (CLUMPAK): (a) results of the ΔK method; (b) bar plots where each dog is represented by a single vertical line and this line has coloured segments representing the relative percentage of membership of the cluster; (c) the admixture structure of four populations in geographic space for K = 7; (d) neighbour-net tree for four populations based on the pairwise Fst values.
The analysis of pairwise Fst values showed different degrees of genetic similarity and divergence (Table 3; Fig. 3d). Pop1 and Pop3 showed no genetic differentiation (Fst = 0.000), while Pop2 was minimally different from Pop3 (Fst = 0.003) and moderately different from Pop1 (Fst = 0.017). Pop4 showed the highest genetic differentiation from Pop3 (Fst = 0.023) and moderate differentiation from Pop1 (Fst = 0.015) and Pop2 (Fst = 0.020), making it the most genetically differentiated population in this analysis.
Preparing whole-genome sequencing data
We performed WGS on two dogs of the indigenous Kazakh Tobet breed (BioProject ID PRJNA1144634): TB1 (male) and TB63 (female) (Fig. 4). The selection of TB1 was supported by its position close to the intersection of the axes in the PCoA plot, suggesting that the genetic composition of TB1 may be representative for the overall genetic structure of all analysed Kazakh Tobet dogs, while TB63 received the highest expert scores.

Kazakh Tobet dogs TB1 (a) and TB63 (b).
A description of the sequence data can be found in Table 4.
In addition, genome sequences of 43 dogs from 24 breeds traditionally used for guarding, herding or serving livestock and other work were downloaded from public databases. A total of 21,852,067 autosomal variants were called for all 45 dogs. After applying the GATK criteria for variant filtering, 15,995,420 SNVs were selected for subsequent analyses. The SNV set was further filtered using Plink 1.9, resulting in 14,668,406 SNVs, that were used as the input dataset for the construction of the phylogenetic tree.
Determining mitochondrial haplogroup
The haplotype A18 (C15814T) was identified for both Kazakh Tobet dogs. The haplotypes for the other dogs can be found in Supplementary Table 1. New haplotypes were identified in five dogs.
Phylogenetic tree
We constructed a neighbor-joining phylogenetic tree based on WGS data for 45 dogs from 25 breeds that have been used in the past as LGD, HSD and for other work, including two Kazakh Tobet dogs (Fig. 5). The Kazakh Tobets (TB1 and TB63) and the Central Asian Shepherd Dogs showed a close genetic relationship and were clustered with the Akbash. However, the Kazakh Tobet dogs were not grouped as a separate breed. This group of three breeds had the closest common node with a group of four breeds that included Samoyeds, Tibetan Mastiffs, Huskies and Akita dogs, with the Samoyeds forming a more distinct cluster. The Great Pyrenees also showed a separate genetic lineage. The Old English Sheepdog was part of a large group that also included Australian Shepherds and English Shepherds. Breeds such as the Great Dane, the Staffordshire Bull Terrier, the French Mastiff and the Bullmastiff, as well as the Bernese Mountain Dog, the St. Bernard, the Leonberger and the Rottweiler, formed two further large groups of clades. The Newfoundland and the Briard, the German Shepherd and the Standard Schnauzer, as well as the Slovak Cuvac and the Kuvasz, formed their own narrow groups.

Neighbour-joining phylogenetic tree constructed for working breeds, including Kazakh Tobet dogs (the internal vertices are labelled).
In addition, a maximum likelihood tree was constructed based on the mitochondrial D-loop sequences of 45 dogs (Fig. 6). The breeds were grouped mainly according to their mitochondrial haplotypes in clades. The two Kazakh Tobet dogs (haplotype A18) were close to each other with a branch length of zero and formed a large central cluster alongside Briard and Great Pyrenees (haplotype B1), English Shepherds (haplotype B3) and the Standard Schnauzer (haplotype B12). Central Asian Shepherds were found in haplogroup A11 as well as in a new haplotype. Similarly, the Akbash appeared in haplotypes A11 and A20. From a broader perspective, the Kazakh Tobet (A18), the Akbash (A20) and the Central Asian Shepherd were part of a larger group, although this grouping had a relatively low bootstrap value of 0.155. This result supports the topology and placement of these breeds in the WGS tree. For the other breeds, the clustering of dogs of the same breed to a single clade in the mtDNA D-loop tree was less well resolved compared to the WGS tree, as mitochondrial DNA only captures maternal lineage. Therefore, several breeds in the tree were divided into different clades (e.g. Bernese Mountain Dogs, Standard Schnauser, Samoed, etc.).

Maximum likelihood tree for working breeds, including Kazakh Tobet dogs, based on mitochondrial D-loop sequences (new haplogroups are labelled as NA).
Discussion.
To improve our understanding of the genetic structure and phylogenetic relationship of the Kazakh Tobet, especially with other breeds traditionally used for guarding and herding livestock, in this study we analysed STR data from 107 Kazakh Tobet dogs from the south, east and north regions of Kazakhstan and from the Bayan-Ulgii district of Mongolia, as well as WGS data from two Kazakh Tobet dogs and 43 dogs from 24 different breeds.
Genetic diversity and structure of the Kazakh Tobet dogs
The Kazakh Tobet dogs showed high genetic variability and diversity, which is reflected in the average number of alleles per locus (Na) and the observed heterozygosity (Ho). The mean Na value in the four different populations ranged from 6.00 to 10.22. This level of genetic diversity is comparable to that observed in our earlier study in a smaller group of Kazakh Tobet from the southern region of Kazakhstan (Na = 7.11)30 as well as in other breeds within the molossoid group. The Tibetan Mastiff, for example, has an average Na value of 7.70, based on a panel of 10 STR loci31. Similarly, the English Bulldog has an average Na value of 6.46, derived from 33 STR loci32. In contrast, the genetic diversity of the Kazakh Tobet exceeds that of the French bulldog, which has an Na value of 5.1033. The observed heterozygosity was over 78% in all Kazakh Tobet dogs, with a range of 76.4–78.5% between the four populations, which was higher than the Ho values observed in other molossoid breeds such as Boxer, Staffordshire Bull Terrier and Rottweiler (Ho = 0.51, 0.63 and 0.47, respectively) when analysing a panel of 15 STRs23, and Tibetan Mastiff and French Bulldog (Ho = 0.69–0.76 and 0.61, respectively) when analysing a panel of 10 STRs31,33,34. In comparison, non-molossoid breeds, such as the Korean Donggyeongi dog, Italian Pointer, Podenco, Jack Russell Terrier and Yorkshire Terrier, showed similar levels of observed heterozygosity, with Ho values of 0.73, 0.72, 0.71–0.72, 0.76 and 0.73, respectively, when analysed with panels of 10–19 STR loci23,35,36. Previous studies on the Tazy, another national Kazakh breed belonging to the sighthound group, also reported high Ho values (Ho = 0.75)37. Heterozygosity is often used to assess the degree of mixing with another breed. Low observed heterozygosity usually indicates purebred dogs, while high levels of observed heterozygosity are associated with mixed breeds or village dogs. Village dogs, for example, generally have Ho values between 0.73 and 0.8038. The high Ho values observed across all four Kazakh Tobet populations indicate significant crossbreeding.
The average expected heterozygosity for all analysed Kazakh Tobet samples was 0.81, which is higher than the observed heterozygosity of 0.78. When these parameters are equal, this usually indicates that crossing within the population occurs almost randomly. In cases where the observed heterozygosity is lower than the expected heterozygosity, the population is considered inbred, and conversely, if the observed heterozygosity exceeds the expected values, the population is considered outbred. In the Kazakh Tobet dogs, the slightly higher value of expected heterozygosity compared to observed heterozygosity indicates that random mating rather than inbreeding occurs in this cohort, which is also supported by the significant value of the inbreeding coefficient of almost zero (F = 0.03), indicating minimal inbreeding overall.
The analysis of the genetic structure of the analysed sample using PCoA confirms the high genetic diversity of the Kazakh Tobet dogs. As is well known, PCoA measures the genetic relatedness of individuals within a population. In the PCoA graph, the Kazakh Tobet dogs form a group with considerable diversity, as shown by the diffuse distribution along the Y-axis and the genetic outliers. Furthermore, based on the average values of the logarithm of the likelihood function and the dispersion of the estimates obtained in ten runs of STRUCTURE with a selected set of appropriate parameters, the optimal number of clusters in the analysed sample was seven. And all genetic clusters were present in all four populations, as shown by the admixture structure of the four populations in geographic space.
Although the Kazakh Tobet breed has considerable genetic diversity, there are notable differences between the four populations. The population from South Kazakhstan shows the highest genetic diversity, with the highest average number of alleles per locus (Na = 10.22) and number of effective alleles (Ne = 5.42), but a positive F-value (F = 0.03) indicates a slight deficit of heterozygotes and suggests some degree of inbreeding or the influence of population structure effects. The populations of East Kazakhstan and North Kazakhstan show moderate genetic diversity. The East Kazakhstan population has an average Na of 7.00 and the highest observed heterozygosity (Ho = 0.79). The fixation index for the population (F=-0.02) is slightly negative, indicating a slight excess of heterozygotes. The population from North Kazakhstan also shows considerable diversity, with an average Na of 6.11 and Ho of 0.76, but with a slight heterozygote deficiency (F = 0.02). Meanwhile, the population from Mongolia is characterized by the lowest genetic diversity, with an average Na of 6.00 and the lowest Ne (4.14), and has a negative F-value (F=-0.04), reflecting a possible trend towards crossbreeding. However, the interpretation of the obtained F values should be treated with caution, as the small sample sizes in the populations make these values statistically insignificant.
The analysis of the genetic distance between the four Kazakh Tobet populations shows different levels of genetic divergence. The population from South Kazakhstan shows the closest genetic relationships to the populations from East and North Kazakhstan. It is possible that the frequent gene flow has led to a low degree of genetic differentiation between these populations. In contrast, the population from Mongolia is the most genetically differentiated population, especially compared to the population from the northern region with the highest Fst values of 0.023. This considerable genetic differentiation becomes more understandable when one considers that migration to Mongolia began as early as the 19th century, when Kazakhs living in the Chinese province of Xinjiang began to leave their homeland because they were oppressed by the Dungan and Uyghurs. It can also be assumed that a high genetic diversity was already characteristic of the Kazakh Tobet at that time, as all seven genetic clusters of the Kazakh Tobet from Kazakhstan can also be found among the Kazakh Tobet dogs in Mongolia.
Phylogenetic relationships and ancestry of the Kazakh Tobet dogs
Possibly due to the high genetic diversity, the Kazakh Tobet dogs, unlike most other breeds we have observed, did not form a distinct cluster in the phylogenetic tree constructed based on the WGS data. Nevertheless, this breed clearly showed its common genetic origin with the Central Asian Shepherd Dog and the Turkish Akbash breed. It is known that the Central Asian Shepherd Dog is an established breed that originated from several indigenous populations in Central Asia in the 20th century39. Our phylogenetic analysis suggests that the Kazakh Tobet may have been one of these ancestral forms, as well as the Akbash, a Turkish shepherd dog common in the western regions of Turkey. Previous mitochondrial analyses have already shown the close relationship between the Turkish breeds Akbash and Kangal and the Central Asian Shepherd Dog39. The tree we constructed based on the mitochondrial D-loop sequences also confirms the genetic relationship and a possible recent divergence between the Kazakh Tobet, Akbash and the Central Asian Shepherd. The Kazakh Tobet and Akbash may be descended from ancient guard dogs that spread throughout the region thousands of years ago, before the modern state borders were established. Kazakhstan’s strategic location in Central Asia made it an important link in the Silk Road network, facilitating interaction and exchange between East and West. Routes ran through the Ili Valley in Kazakhstan, connecting the region with various parts of Eurasia, including Turkey40. The ancient guard dogs may have been exchanged or bred along these routes, resulting in a common genetic heritage between the Kazakh Tobet and the Akbash and providing for the extremely low genetic differentiation in the Asian LGDs, previously demonstrated in the Caucasian Shepherd Dog, North Caucasian Volkodav, Central Asian Shepherd Dog and Turkish Akbash and Kangal based on analyses of mitochondrial DNA39 and also observed in the Kazakh Tobet in this study. Interestingly, our haplotype analysis based on mitochondrial read extraction from the WGS showed that both samples of Kazakh Tobets had haplogroup A (haplotype A18), further supporting the ancient origin of the breed. Previous research has shown that haplogroups A, B and C together account for approximately 97.40% of the global dog population, with haplogroup A alone accounting for approximately 72.34% of dogs41, suggesting that haplogroup A has played a crucial role in the evolution of different dog breeds. Recent extensive analyses of haplotype networks have confirmed that haplogroup A was introduced into dog populations during the early stages of domestication42. It is noteworthy that 11 haplotypes of haplogroup A, including haplotype A18, had significantly high betweenness values and were clearly recognisable in this network. Haplotype A18 ranks sixth after A3, A9, A15, A29 and A11 and can rightly be described as ancient. It has already been identified in various regions of the world, including village dogs from Southeast Asia and the Middle East, Vietnamese dogs, European and Middle Eastern breeds43,44,45. The widespread distribution of haplotype A18 in breeds such as the Serra da Estrela Mountain Dog, Central Asian Shepherd Dog, North Caucasian Volkodav, Caucasian Shepherd Dog, Turkish Akbash and Kangal and Tibetan Mastiff highlights its importance in the historical development of LGDs39,43,46.
However, the hypothesis about the close phylogenetic relationship between the Kazakh Tobets and the Tibetan Mastiffs, which is widespread among dog breeders in Kazakhstan, is refuted by our phylogenetic analysis. According to the genetic distances determined, the Kazakh Tobet has as long an evolutionary history as the Tibetan Mastiff. As far as we know, this is the first phylogenetic study of the Kazakh Tobet from Kazakhstan. In a recent study by Yang et al., a phylogenetic analysis of 15 indigenous Chinese dog breeds, including the Kazakhstan Shepherd Dog, was conducted using genotyping data from 170,000 SNP chips47. This Shepherd Dog may belong to the Kazakh Tobets, which were brought to the Xinjiang Uygur Autonomous Region of China by Kazakhs. Yang et al. also showed that the Kazakhstan Shepherd Dog is grouped into clades that are distinct from the large Chinese clade including the Tibetan Mastiff and do not show close genetic relationship with western breeds such as the Bernese Mountain Dog, German Shepherd, Newfoundland and Rottweiler.
The WGS included only two dogs of the Kazakh Tobet breed, which is a significant limitation of this study given their high genetic diversity. Due to this diversity, the selection of a suitable dog for the WGS is not a trivial endeavour. It is expected that increasing the sample size for phylogenetic analysis will more accurately confirm or refute the current results. However, the relevance of the phylogenetic relationships of the Kazakh Tobet that have been revealed remains substantiated, as our results regarding the other breeds are consistent with previous studies27,48. In the cladogram of 161 domestic dog breeds based on the genotyping data of 170,000 SNP chips, Samoyeds, Tibetan Mastiffs, Huskies and Akitas were also grouped together. The Great Dane clade was clearly separated from the clades of breeds such as the Staffordshire Bull Terrier, the French Mastiff and the Bullmastiff. The Old English Shepherd was genetically related to the Australian Shepherd and the Bernese Mountain Dog was related to the St Bernard, the Leonberger and the Rottweiler27. In addition, a genetic relationship between the German Shepherds and the Standard Schnauzers has already been established48. It seems that in this study, where we aimed to evaluate the genetic distances between dog breeds with a limited sample, the neighbour-joining tree was well suited to highlight the clear breed-specific divergence. Nevertheless, in future work with larger datasets, the neighbour-net tree would allow deeper insights into non-tree evolutionary processes between breeds and within breeds, such as hybridisation, recombination or gene flow49. In addition, not only SNVs but also indels, which were excluded from our analysis, may improve the accuracy of phylogenetic reconstruction in our future work. Studies have shown that while SNVs are more frequent, stable and more easily aligned across sequences, making them ideal for measuring genetic divergence and evolutionary relationships, indels can also be reliable for phylogenetic analyses50,51. However, there is disagreement about the best method for defining homologous character states and coding strategies50,51. Another drawback of our research is that while microsatellite markers indicate similar biological processes and patterns as SNPs, it is essential to verify these results with genome-wide commercial SNPs. Moving forward, we plan to conduct a comprehensive genome-wide SNP analysis of Kazakh Tobet dogs to confirm their high genetic diversity, identify selective breeding signatures and assess the genetic distinctiveness of the breed compared to related breeds. Both SNP and STR markers will provide stronger support for the robustness of our results. However, given the evolving understanding of genetic diversity theories, our results may still require careful interpretation, especially when assessed solely through the lens of neutral theory, which has recently come under heavy criticism. Neutral theory, which assumed that most of the genome is variable and neutral and formed the basis for understanding genetic variation52,53, has been challenged by recent empirical studies54 showing that it does not fully explain genetic diversity. STRs, traditionally considered neutral, have been shown to have functional roles, such as binding transcription factors55, thus challenging previous assumptions. As an alternative, the theory of maximum genetic diversity has emerged56, which assumes that genetic diversity reaches a maximum saturation point. This could influence the interpretation of our results, so that they will have to be re-evaluated in the future.