|Year : 2020 | Volume
| Issue : 1 | Page : 5-11
Study of autosomal short tandem repeat loci using ITO method in full-sibling identification
Li Yuan1, Xu Xu2, He Ren3, Zhao Zhao4, Tong Wang4, Shicheng Hao2, Jinpei Zhang2, Yan Liu2, Yan Xu2
1 Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Ministry of Public Security; Collaborative Innovation Center of Judicial Civilization, Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, Beijing, PR China
2 Collaborative Innovation Center of Judicial Civilization, Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, Beijing, PR China
3 Department of Criminal Science and Technology, Beijing Police College, Beijing, PR China
4 Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Ministry of Public Security, Beijing, PR China
|Date of Submission||21-Jan-2020|
|Date of Decision||19-Feb-2020|
|Date of Acceptance||19-Feb-2020|
|Date of Web Publication||17-Mar-2020|
Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Ministry of Public Security, Beijing; Collaborative Innovation Center of Judicial Civilization, Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, Beijing
Source of Support: None, Conflict of Interest: None
This study aimed to investigate the application of autosomal short tandem repeat (STR) loci using the ITO method and discriminant function algorithm for full-sibling (FS) identification. A total of 342 pairs of full siblings (FSs) and 3900 pairs of unrelated individuals (UIs) were genotyped at 51 STR loci. The groups were in accordance with discrimination power (DP) values and the number of loci, and the values of FS index (FSI) of FSs and UIs were calculated by the ITO method. The discriminant functions of FS–UI were established using the Fisher's discriminant analysis method with SPSS 19.0 software. All the lgFSI values in the FS and UI groups followed a normal distribution, and there were significant differences between the two pairs. A higher average DP value was associated with a more significant difference, as was a greater number of STR loci detected. Receiver operator characteristic curves showed that the accuracy of FS identification can be affected by both locus polymorphism and the number of loci detected. Comparing the rate of false positives and false negatives of discriminant function between the two groups, a higher average DP value and larger number of loci detected were associated with a lower rate of miscarriage of justice and were more helpful for FS–UI identification. The ITO-based discriminant analysis method has high applicability in FS–UI tests. Testing of a greater number of STR loci promotes FS identification.
Keywords: Discriminant analysis, forensic biological evidence, full-sib relation, ITO method, short tandem repeat
|How to cite this article:|
Yuan L, Xu X, Ren H, Zhao Z, Wang T, Hao S, Zhang J, Liu Y, Xu Y. Study of autosomal short tandem repeat loci using ITO method in full-sibling identification. J Forensic Sci Med 2020;6:5-11
|How to cite this URL:|
Yuan L, Xu X, Ren H, Zhao Z, Wang T, Hao S, Zhang J, Liu Y, Xu Y. Study of autosomal short tandem repeat loci using ITO method in full-sibling identification. J Forensic Sci Med [serial online] 2020 [cited 2022 May 18];6:5-11. Available from: https://www.jfsmonline.com/text.asp?2020/6/1/5/280895
| Introduction|| |
The identification of full siblings (FSs) plays an important role in identifying individuals in criminal and civil law cases and in the search for missing persons when the parents are absent. However, FS identification can be difficult because, between siblings, all alleles of each short tandem repeat (STR) locus can be same, half of them can be same, or all can be completely different. Mitochondrial DNA and genetic markers on sex chromosomes can be helpful for definitive identification. As mitochondrial DNA is maternally inherited and all offspring share the same mitochondrial DNA sequence, the finding of different profiles in mitochondrial DNA testing can rule out the possibility that the tested samples are from the same maternal lineage. In addition, Y-STR reflects paternal inheritance, with all males of the same paternal lineage sharing the same Y-STR profile; therefore, males of different Y-STR types cannot be from the same paternal lineage. As such, the X-STR test may provide complementary information enabling FS identification., However, Y-STRs and X-STRs are only useful in brother and sister identification separately. The heterogeneity and high mutation rate of mtDNA have limited the application of mitochondrial genetic markers in FS identification, and they can only provide complementary information. To obtain evidence to support FS identification, autosomal genetic markers must be typed.
At present, autosomal STRs are widely used in forensic science, and autosomal STR analysis has been applied in FS identification using the ITO method (Wing-Kam Fung, Statistical Theory and Software for Mixtures and Paternity, Testing Based on STR Loci, 2000) and the identity-by-state (IBS) method., The ITO method involves the use of the allele frequency distribution in a population to calculate the FS index (FSI) between two individuals., If the FSI exceeds a threshold (usually set as 1000 or 10,000), the result tends to support a FS relationship. The ITO method showed higher accuracy than the IBS method for such analyses. However, the impact of the number tested loci and the power of discrimination of tested STR loci when using the ITO method have rarely been investigated. In the present study, 51 autosomal STR loci were characterized for each individual, and with different combinations, the impact of autosomal STRs on the ITO method was investigated in the context of FS identification.
| Methods|| |
Samples and short tandem repeat analysis
A total of 342 FS pairs of Han Chinese origin from Beijing provided informed consent to participate in this study. Selected FS pairs from triplet families with two or more offspring were identified to be brothers and sisters of the same father and mother, with the exclusion of identical twins and cases with STR mutation identified in the pedigree after the “father–child–mother” paternity test. A total of 3900 unrelated pairs were random paired combinations of unrelated individuals (UIs). This study has been approved by the institute's ethic committee. The sample collection, DNA analysis, and storage were conducted in accordance with the humane and ethical research principles of the Key Laboratory of Evidence Science, China University of Political Science and Law.
The DNA samples were amplified using Goldeneye20A™ (PeopleSpot Inc., Beijing, China), AGCU EX22, AGCU 21 + 1 (AGCU ScienTech Inc., Wuxi, China), and a five-colored fluorescent multiplex amplification system established in our laboratory. For each sample, 51 STR loci were detected. Polymerase chain reaction products were separated by electrophoresis on an ABI 3130 genetic analyzer, and ABI Genemapper ID 3.2 software (ThermoFisher Scientific, Waltham, MA, USA) was used for analysis. Internal control standards within the laboratory were used in accordance with the recommendations of the Paternity Testing Commission of the International Society for Forensic Genetics.
In this study, 323 healthy unrelated Han Chinese individuals from Beijing were randomly selected and used for the detection of genetic polymorphism at 51 STR loci. The genotype frequency distribution of the STR loci was tested in accordance with Hardy–Weinberg equilibrium using GENEPOP (4.0) software, and linkage disequilibrium between loci was also tested. PowerStatsV12 software (Promega, Madison, WIS, USA) from Promega Corporation was used to calculate the allele frequencies.
Statistical analysis and discriminant function creation
According to the allele frequencies of each locus, the ITO method was used to calculate the FSI of five locus combinations of all the FS and UI pairs, and then converted to lgFSI. SPSS 19.0 software was used for descriptive statistics for lgFSI, t-test of mean lgFSI of FSs and UIs, and Fisher's discriminant analysis and function creation. Matlab 2013a software was used for lgFSI normal distribution curve fitting analysis of the five groups of FSs and UIs. Stata/MP 13.1 software (StataCorp LP, College Station, TX, USA) was used to depict and compare the area and the difference under the receiver operator characteristic (ROC) curves.
| Results|| |
Hardy–Weinberg equilibrium test and disequilibrium test
Statistical analysis was performed on the genotype distributions of 51 STR loci [Table 1]. There was no significant divergence from Hardy–Weinberg equilibrium with Bonferroni correction (P = 0.05/51 = 0.00098). The pair-wise linkage disequilibrium test of 51 STR loci showed a P < 0.05 [Supplementary Table 1] for 93 out of the total of 1275 pairs of loci, and there were still nine pairs showing statistically significant differences after Bonferroni correction (P = 0.05/1275 = 0.000039), namely, FGA/D6S1043, FGA/D20S470, D7S820/D18S51, D7S3048/D20S470, D8S1179/CSF1P0, D9S1122/D17S1290, D12ATA63/D18S853, D12ATA63/D17S1290, and D13S317/Penta E. Given that these nine pairs of STR loci are located on different chromosomes, in line with other studies, this study does not consider the presence of linked inheritance for these loci in the groups of STR combinations.
|Table 1: Fifty-one autosomal short tandem repeat loci and each locusf genetic parameters|
Click here to view
In the Han Chinese population from Beijing, polymorphism information content of 0.5485–0.9048, observed heterozygosity of 0.5988–0.9105, discrimination power (DP) of 0.9823–0.7930, and probability of paternity exclusion of 0.2894–0.8169 were found for these 51 STR loci, the results of which are shown in [Table 1]. We combined the STRs according to the grouping method in accordance with the average DP values shown in [Table 2].
LgFSI of n-short tandem repeat combinations
For Groups I–V of STR combinations, the total allele frequencies and normal distribution curve fitting results for 342 FS pairs and 3900 UI pairs are shown in [Figure 1]. Two distribution curves overlap for each combination of STRs, and with the increase of the average DP values, the amount of overlap becomes smaller in the comparison of Groups I, II, and III; in the comparison of Groups II, IV, and V, there are similar average DP values for these combinations, but with an increasing number of the detected loci, the area of overlap becomes smaller.
|Figure 1: The frequency of lgFSI in full-sibling and unrelated individual pairs and the normal distribution curve fitting in the short tandem repeat locus groups (Groups I–V are shown from top to bottom)|
Click here to view
In the five combinations, there were significant differences in lgFSI between FSs and UIs [Table 3]. In the five combinations, [Table 4] shows the proportions of FSs and UIs in the area of overlap with the same lgFSI for the two curves.
|Table 3: Statistical description of the lg full-sibling index in full siblings and unrelated individuals of Groups I-V|
Click here to view
Receiver operator characteristic curve determines the effectiveness of each short tandem repeat combination
Different combinations of STR loci were used for FS identification, and ROC curves were applied to determine the effectiveness of each STR combination [Figure 2]. ROC curve areas and variation extent were compared to identify the difference, with P values shown in [Table 5].
|Figure 2: Receiver operator characteristic curves of short tandem repeat groups (right for Groups I, II, and III; left for Groups II, IV, and V)|
Click here to view
|Table 5: Difference of area and fluctuation of receiver operator characteristic curve among locus groups (P values)|
Click here to view
Establishment of Fisher's discriminant function of lgFSI
With the SPSS 19 software (SPSS Inc., Chicago, IL, USA), discriminant functions were established using Fisher's discriminant analysis of lgFSI for the five combinations. The discrimination rule is as follows: L (S) = max (LFS[S], LUI[S]); using the discrimination function formula, the category of individual pairs was determined based on the category of larger L values. Discriminant function and the corresponding misclassification rate of the five STR combinations are shown in [Table 6]. The posterior probability of FS and UI classification with different total number of alleles is shown in [Supplementary Table 2].
|Table 6: Discriminant functions and misclassification rate in full-sibling and unrelated individuals in five short tandem repeat locus groups|
Click here to view
| Discussion|| |
Currently, routine autosomal DNA analyses are mainly based on the CODIS core STR loci, such as D2S1338, TPOX, D3S1358, FGA, D5S818, CSF1PO, D6S1043, D7S820, D8S1179, TH01, D12S391, vWA, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, and Penta D, as well as 19 other STR loci. However, in the identification of grandfather and grandson, siblings, as well as complex families and cases with DNA mutations, it is often necessary to increase the number of tested autosomal STRs. Through surveying 51 autosomal STR loci in the Han Chinese population in Beijing, significant differences were shown in the DP for individual loci, such as DP values of over 0.9500 for Penta E, D7S3048, D6S1043, D18S51, D20S470, D2S1338, FGA, D15S659, D12S391, D11S2368, and D8S1179, but DP values of 0.8500 or less for D1S1677, D1S1627, TH01, D1GATA113, and TPOX. Therefore, when performing FS identification, the DP values of supplemental STRs need to be considered with respect to their impact on the final determination.
In the present study, lgFSI showed a normal distribution in both FSs and UIs, with average lgFSI of 3.6725, 5.4187, 6.3363, 8.2456, and 10.9795 in FSs, and of − 3.2374, −4.3762, −4.9656, −6.6185, and − 8.6449 in UIs for Groups I–V, respectively. When the number of loci detected was fixed (such as in Groups I, II, and III), the overlap region decreased as the average DP value increased. When the average DP value was fixed, the differences became more significant and the overlap region decreased as the number of loci in an STR combination increased.
The comparison of ROC curve areas of the five STR combinations showed significant differences in classification accuracy between all the paired groups, indicating that the accuracy of FS identification can be affected by both the number of loci and the discriminatory ability of loci.
When using detection systems of Groups I–V for the 342 FS pairs and 3900 UI pairs, the highest lgFSI values were 3, 2, 3, 1, and 0 in UIs and the lowest ones were −2, −2, −1, 0, and 1 in FSs. Given the thresholds specified for tendentious comments in the table below, the detection efficiency of systems is shown in [Table 7]; of course, with a different given threshold, the detection efficiency will change accordingly.
|Table 7: Detection threshold of lg full-sibling index and system performance of the short tandem repeat combinations in full sibling-unrelated individual identification|
Click here to view
Discriminant analysis is a method of multivariate statistical analysis for categorizing samples, based on the observed data of known samples to determine the discriminant function and to classify unknown samples of new types based on the existing criteria. Discriminant functions are established based on the number of alleles shared between FSs and UIs; new samples are analyzed with discriminant functions to determine whether they are attributable to FSs or UIs. The posterior probability of FS and UI judgment with different groups is shown in Supplementary [Table 2]. In the present study, misclassification rates of discriminant functions were 2.6161%, 1.3619%, 0.8977%, 0.5976%, and 0.4386% for Groups I–V, respectively. Upon comparing the misclassification rates of Groups I, II, and III, a higher average DP value was shown to be associated with a lower rate of classification accuracy, indicating that STR loci with high DP values are more beneficial for FS identification. Upon comparing the misclassification rates of Groups II, IV, and V, a greater number of detected STR loci were found to be associated with a lower misclassification rate, indicating that more STR loci facilitate FS identification.
| Conclusion|| |
Regarding differences in the effectiveness of ROC curves and the misclassification rate of discriminant function, for FS identification, when detecting a combination of the same number of genetic markers, autosomal STRs with higher average DP values are more beneficial. When the average DP values are same, additional detection of more STR loci is beneficial for interpreting the results and increasing the accuracy of classification in FS identification. Therefore, in FS identification, STRs with higher polymorphism and higher DP values should be selected when testing additional autosomal STRs.
This work was supported by the Open Project of Key Laboratory of Forensic Genetics, Ministry of Public Security (2017FGKFKT03). We are grateful to Yacheng Liu and Chengtao Jiang for their valuable technical assistance.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Tang MY, Huang J, Cai JH, Huang XS, Xu W, Qu ZX, et al
. Identification of sibling brothers using STR and Y-biallelic markers. Fa Yi Xue Za Zhi 2012;28:190-4.
Zhang SH, Zhao SM, Li L. Identification of sibling sisters using STR and SNP. Fa Yi Xue Za Zhi 2010;26:185-7.
Chen F, Wang SY, Zhang RZ, Hu YH, Gao GF, Liu YH, et al
. Analysis of mitochondrial DNA polymorphisms in Guangdong Han Chinese. Forensic Sci Int Genet 2008;2:150-3.
Butler JM. Short tandem repeat typing technologies used in human identity testing. Biotechniques 2007;43:ii-v.
Aoki Y, Nakayama Y, Saigusa K, Nata M, Hashiyada M. Comparison of the likelihood ratio and identity-by-state scoring methods for analyzing sib-pair test cases: A study using computer simulation. Tohoku J Exp Med 2001;194:241-50.
Li CT, Sun HY, Zhao SM, Lu HL, Li L, Hou YP. Forensic Identification Technical Specification: The Implementation Specification of Biological Full-Sib Identification (SF/Z JD0105002-2014), Issued by Administration Bureau of Judicial Identification, Ministry of Justice P.R.C; 2014.
Li CT, Hou YP, Li L, Zhang SH, Liu YC, Sun HY. National standard (GB/T 37223–2018), Specification of parentage testing. Issued by Administration Bureau of Judicial Identification. Ministry of Justice, P.R.C; 2018.
Zhao SM, Zhang SH, Que TZ, Zhao ZM, Lin Y, Li L, et al
. Establishment of universal algorithms for commonly used kinship indices between two individuals. Fa Yi Xue Za Zhi 2011;27:330-3.
Xu M, Du Q, Ma G, Chen Z, Liu Q, Fu L, et al
. Utility of ForenSeq™ DNA signature prep kit in the research of pairwise 2nd
-degree kinship identification. Int J Legal Med 2019;133:1641-50.
Presciuttini S, Casarino L, Verdiani S et al
. Distribution of the number of alleles identical by state summed over several loci among full-sib pairs and unrelated individuals. In: George F
Sensabaugh, editors. Progress in Forensic Genetics 8. Netherland: Elsevier Science; 2000. p. 88-90.
Gui J, Liu HB, Liao QX, Xu X, Lu D, Yuan L. Establishment of a 15 loci multiplex amplification system and the genetic polymorphism in Xinjiang Uygur population. Fa Yi Xue Za Zhi 2015;31:23-7.
Morling N, Allen RW, Carracedo A, Geada H, Guidet F, Hallenberg C, et al
. Paternity testing commission of the international society of forensic genetics: recommendations on genetic investigations in paternity cases. Forensic Sci Int 2002;129:148-57.
Yuan L, Ge J, Lu D, Yang X. Population data of 21 non-CODIS STR loci in Han population of northern China. Int J Legal Med 2012;126:659-64.
Li CT, Hou YP, Li L, Zhang SH, Liu YC, Sun HY. Forensic Identification Technical Specification: Paternity Testing Specification (SF/Z JD0105001–2010), Issued by Administration Bureau of Judicial Identification. Ministry of Justice, P.R.C; 2010.
Wang J, Xia JL, Ye DQ. Improvement of discriminant analysis in the medical area. Appl Stat Manage 2008;27:369-76.
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7]