DIF analyses can and indeed should be used to ensure that assessment items and tests are fair, in particular equally valid for various specified examinee subgroups, such as males and females. The DIF-Pack includes both source code and executable code for SIBTEST, POLY-SIBTEST, and Crossing SIBTEST, which together can test variously scored items and bundles of items (a bundle viewed as a collection) for various kinds of DIF. The source code is provided to facilitate continued improvement of the software, including applications to non-standard settings, research projects, improvements to the statistical algorithms, etc.
SIBTEST – Assessing Differential Item/Bundle Functioning (DIF/DBF)
SIBTEST implements a nonparametric estimation and hypothesis testing statistical method of assessing DIF in one or more items and/or DBF in one or more bundles of items. The method is based on Shealy-Stout’s (1993) multidimensional IRT model for DIF, a model further developed by Roussos and Stout (1996). The model assumes that examinees matched on the level of the latent ability the test is designed to measure may differ in their expected performance on an item or on an item bundle because of construct irrelevant sources of score variation, and as such DIF may occur. This desired matching of examinees on the latent target ability that the test is intended to measure is done approximately by matching examinees either on total test score or on a user-specified subscore believed to validly measure the target ability over the studied examinee subpopulations. The total score except for certain items being removed that are believed to be possibly DIF-producing is a common and sound matching choice. With a flexible, user-friendly front-end program the user specifies the particular DIF/DBF hypothesis tests she wishes to perform, including:
- which item(s) or item bundle(s) will be tested for DIF/DBF,
- the alternative hypothesis to be tested: either a one-sided hypothesis of DIF/DBF against the reference group, a one-sided hypothesis of DIF/DBF against against the focal group, or a two-sided hypothesis of DIF/DBF against either group, and
- which items will be used to construct the examinee matching score.
SIBTEST uses a sophisticated nonlinear regression correction procedure (Jang & Stout, 1998) to match examinees, a procedure that has demonstrated improved effectiveness in controlling the false positive flagging of non-DIF items in comparison to the original linear regression correction in Shealy and Stout (1993).
Crossing SIBTEST tests for crossing DIF/DBF, the condition in which the DIF/DBF is experienced by one group for lower values of ability level and by the other group for higher levels of ability. In this manner, DIF against both specified examinee populations is possible in a single item, occurring as a function of ability level. Analogous to SIBTEST, this program uses a regression correction technique to appropriately match examinees from the two populations, and computes the expected score differences for a suspect item or item bundle at varying ability levels. A strength of the procedure is its capacity to identify the location of the crossing point (e.g., the target ability level at which the expected score difference changes sign). Output includes an estimate of the amount of crossing-DIF present, the point at which such crossing occurs, the relative amount of the total amount of crossing DIF/DBF occurring against each population (necessary because crossing DIF consists of DIF against both populations depending on ability level), and the results of the DIF/DBF tests of significance. The Crossing SIBTEST procedure is also sensitive to the more commonly occurring unidirectional DIF/DBF, although the hypothesis testing statistical procedure differs some in algorithmic detail from regular SIBTEST.
Polytomous SIBTEST handles polytomously scored item responses (ordered responses with more than two response categories). This extension of SIBTEST is conceptually straightforward, except for some changes required to account for the additional ordered item score categories in the matching criterion. It uses a revised reliability estimate in performing the regression correction, which controls for too many DIF/DBF false positives. This regression correction is a generalization of the original linear regression correction of Shealy and Stout (1993).