The Effect of the Difference in the Equation Method on Equating Two Forms of Achievement Test According to Item Response Theory

Document Type : Original Article

Authors

قسم علم النفس التعليمي والإحصاء التربوي، کلية التربية بنين بالقاهرة، جامعة الأزهر، مصر

Abstract

The current research aimed at identifying the effect of the difference in the equation method (Mean/Mean, Mean/Standard, Haebara, Stocking/Lord) on equating two forms of achievement test according to the item response theory. The instruments of the research were two forms of achievement test (X&Y) in Science (Prepared by the researcher). The participants were (1032) pupils enrolled in the second year of preparatory stage, Al-Azhar Educational District in Cairo, and they were divided into two non-equivalent groups. To collect data, the research utilized the non-equivalent groups design with anchor test. Using the statistical programs (SPSS- jMetriK- BILOG-MG- IRT EQ.), the statistical techniques (Repeated measures ANOVA, and Paired Samples t-Test). The results revealed that there were statistically significant differences at (0.01) level between the means of the values resulting from the equation method according to the item response theory. The mean /sigma method gave the highest values compared to the other methods. Frthermore, there were statistically significant differences at (0.01) level between the means of the equation accuracy values for the two test forms according to the item response theory. The mean /sigma method is more accurate than the other methods.

Keywords


Ahmed A. H., (2014). The effectiveness of two-form equation methods A test built in the light of Sternberg's successful intelligence theory according to the traditional measurement theory. The Scientific Journal of the Faculty of Kindergarten, Mansoura University, 1(1), 447-503.
Ahmed B., (2019, April 27-28). The relative effectiveness of the methods of equivalence of two forms of the School Readiness Test (SAT I) in mathematics in the light of the IRT response theory. The Fifth International Conference of the Faculty of Education for Boys in Cairo, Al-Azhar University, “Al-Azhar and General Pre-University Education and the Challenges of the Twenty-first Century - Reality and Expectations”, Volume 2, 695-821.
 
Ahmed A., (2019). The effect of the percentage of missing vertebrae in the common trunk and the way to deal with it on the accuracy of the vertical equation. (Unpublished PhD thesis), College of Education, Yarmouk University.
Ayala R., (2017). Theory and practice in paragraph response theory. (Translated by: Abdullah Al-Kilani and Ismail Al-Bursan), Riyadh: King Saud University Press. (Publishing the original work 2009)
Ismael El., (2002). A comparative psychometric study of some response models for the item in the selection of items for the benchmark reference tests. (Unpublished PhD thesis), Faculty of Education, Al-Azhar University.
Amal Al., (2020). The equivalence of the booklets of the international study test to measure reading skills (PIRLS2011) in the Sultanate of Oman using the vocabulary response theory. Journal of Educational Sciences, College of Education, Qatar University, (12), 68-93.
Iyad H., (2011). Detection of positional correlation between pairs of test items using the Q3 index. Journal of Educational and Psychological Sciences 12(1), 40-68.
Ayman G., (2015). Building a battery of tests to measure the successful intelligence of secondary school students in the light of the theory of response to the single "psychometric study". (Unpublished PhD thesis), Girls' College of Arts, Sciences and Education, Ain Shams University.
Ihab El., (2012). The equation of tests and their relationship to some psychometric variables, a simulation and applied study. (Unpublished PhD thesis), Faculty of Education, Ain Shams University.
Khaled Al., (2014). The effect of stem length and sample size on the accuracy of the equation scores for two math test images according to the common stem design. (Unpublished PhD thesis), Institute of Educational Studies, Cairo University.
Diala Al. & Ramadan D., (2017). Comparison of the effectiveness of the Brown-Hold and percentile methods using the common stem between the two forms of an achievement test in mathematics. Al-Baath University Journal, 39(43), 119-148.
Rashid Al., (2001). Equation of tests its concept, methods, and problems of its application. Journal of Educational and Psychological Sciences, 2(4), 107-141.
 
 
Rayid Al., (2010). Comparison of the effectiveness of the linear Tucker method and Levine's method in equation tests when using a design based on a common trunk and non-random groups. Mutah Journal for Research and Studies, Humanities and Social Sciences Series, 25(7), 11-53.
Rayid Al., (2012). Comparing the effectiveness of the two methods of equivalence of real and observed scores in equivalence of tests using common trunk and unequal groups. Journal of Educational and Psychological Sciences, 13(2), 365-394.
Rushdi Taima (2004). Content analysis in the humanities. Cairo: Arab Thought House.
Salah El.  A., (2005). One-dimensional and multi-dimensional test response models and their applications in psychological and educational measurement. Cairo: Arab Thought House.
Issa Al., (2009). The effect of paragraph representation of the content, percentage of common paragraphs, and methods of paragraph weighting on the accuracy of the equation scores for the test images when the design used is the common stem (common items). (Unpublished PhD thesis), College of Graduate Studies, University of Jordan.
Maya B., (2010). The effect of equation designs, average test difficulty, and power distribution on equivalence of the scores of multidimensional tests using singular response theory. (Unpublished PhD thesis), Institute of Educational Studies, Cairo University.
Maysa A., (2010). Equation of two images of Tony's nonverbal intelligence test using different methods of equation in light of some variables affecting its results. The Egyptian Journal of Psychological Studies, 20 (66), 371-411.
Mohamed A., (2010). The use of test item response models in grading the vocabulary of some cognitive tests. (Unpublished PhD thesis), Faculty of Education, Minia University.
Manar T., (2006). The effectiveness of using IRT (Individual Response Theory) in the process of evaluating the scores of multi-dimensional tests and the variables affecting them. (Unpublished PhD thesis), Institute of Educational Studies, Cairo University.
Heba D., (2020). The effect of missing values ​​processing methods in estimating parameters of some response theory models. (Unpublished PhD thesis), College of Education, University of Aleppo.
 
 
 
Yasser H., (2019). The effect of the difference in the equation method, the methods of estimating the grades and the rules for formulating the items on the accuracy of the paragraph’s parameters and the abilities of the individuals in the light of the classical measurement and the three-parameter logistic model. Journal of the Faculty of Education - Assiut University, 35 (7), 352-434.
Youssef Al., (2011). Comparison of kernel and percentile method and paragraph response theory methods when using common paragraph design in polynomial test score equivalency accuracy. (Unpublished PhD thesis), College of Graduate Studies, University of Jordan.
Youssef Al., (2016). The effect of sample size and test length on the accuracy of polynomial test scores equation using Kernel method. Journal of Educational and Psychological Sciences, 17(3), 201-228.
Youssef El., (2019). The effectiveness of the observed and kernel marks in the equivalency of test scores. Journal of Educational Science Studies, University of Jordan, 46, 204-223.
ثالثاً: المراجع الأجنبية
Arikan, C. & Gelbal, S. (2018). A comparison of traditional and kernel equating methods. International Journal of Assessment Tools in Education, 5(3). 417- 427.
Asiret, S. & Sunbul, S. (2016). Investigating test equating methods in small samples through various factors. Educational Sciences: Theory& Practice, 16(2), 647- 668.
Baker, F. (2001): The Basics of Item Response Theory. (2nd ed.), USA:  The ERIC Clearinghouse on Assessment and Evaluation.
Born, S., Fink, A., Spoden, C. & Frey, A. (2019). Evaluating Different equating setups in the continuous item pool calibration for computerized adaptive testing. Frontiers in Psychology. 10, 1-14.
Castaneda, R. (2017). A Model-building approach to assessing Q3 values for local item dependence. (Unpublished doctoral dissertation), University of  California, Merced.
Chen, F.,  Huang, X.& Gregor, D. (2009). Equating or linking: Basic concepts and a case study. Washington: Center for Applied Linguistics( CAL).
Dawber, T. (2004). Robustness of Lord Formulas for Item Difficulty And Discrimination Conversions Between Classical And Item Response Theory Models. (Unpublished doctoral dissertation), University Of Alberta.
Dorans, N., Moses, T. & Eignor, D. (2010). Principles and practices of test score equating. New Jersey: Educational Testing Service (Research Report: ETS RR-10-29).
Embretson, S.,& Reise, S.(2000). Item Response theory for Psychologists. New Jersey: Lawrence Erlbaum Associates.
Felan, G. (2002, February14-16). Test equating: Mean, linear, Equipercentile, and Item response theory. Paper presented at the annual meeting of the southwest educational research association. Austin, TX, 1-24.
Fitzpatrick, A. & Yen, W. (2001). The effects of test length and sample size on the reliability and equating of tests composed of constructed- response items. Applied  Measurement in Education, 4(1), 31-57.
Georgiev, N. (2008). Item analysis of C, D and E series from Raven′s standard progressive matrices with item response theory two parameter logistic model. Europe′s Journal of Psychology, 8, 1-17.
Gleason, J. (2008). An evaluation of mathematics competitions using item response theory. Notices of The Ams, 55(1), 8-15.
Gruijter, D. & Kamp, L.  (2008). Statistical test theory for the behavioral sciences. London: Taylor& Francis Group.
Kabasakal, K. & Kelecioglu, H. (2015). Effect of differential item functioning on test equating. Educational Sciences: Theory& Practice, 15(5), 1229- 1246.
Keller, L. & Keller, R. (2011). The long-term sustainability of different item response theory scaling methods. Educational and Psychological Measurement, 71(2), 362–379.
Kilmen, S.& Demirtasli, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia- Social and Behavioral Sciences, 46, 130- 134.     
Kim, D., Choi, S., Lee, G. &Um, K. (2008). A comparison of  the common-Item and random-groups equating designs using empirical data. International Journal of Selection and Assessment. 16(2), 83-92.
Kim, J. (2007). A comparison of calibration methods and proficiency estimators for creating IRT vertical scales. (Unpublished doctoral dissertation), The University Iowa.
Kolen, M.& Brennan, R. (2014). Test equating, Scaling, and Linking. (3rded), New York: Springer.
Langer, M. & swanson, D. (2010) practical considerations in equating progress test. Medical Teacher, (32), 509- 512.
Lee, E. (2013). Equating multidimensional test under a random groups design: A comparison of various equating procedures. (Unpublished doctoral dissertation), the University Of Lowa.
Livingston, A. (2014). Equating test scores (without IRT). (2rded) Princeton, NJ: Educational Testing Service.
Magno, C. (2009). Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data, The International Journal of Educational and Psychological Assessment, 1(1), 1-11.
Michaelides, M. (2003, April 21-25). Sensitivity of IRT equating to behavior of test equating items. Paper presented at the Annual Meeting of the American Educational Research Association. Chicago, IL, 1-19.
Michaelides,M. (2006). Effects of misbehaving common items on aggregate scores and an application of mantel-haenszel statistic in test equating. University of California, Los Angeles: Center for the Study of Evaluation (CSE).
Ozdemir, B. (2017). Equating TIMSS Mathematic subtests with nonlinear equating methods using NEAT design: Circle- Arc equating approaches. International Journal of Progressive Education, 13(2), 116-132.
 Pang, X. Madera, E. Radwan, N. & Zhang, S. (2010). A comparison of four test equating methods. Ontario: Report prepared for the Education Quality and Accountability Office (EQAO).
Reeve, B. (2002). An Introduction to Modern Measurement Theory, Outcomes Research Branch, National Cancer Institute: Applied Research Program.
Reise, S.& Waller, N.(2003(. How many IRT parameters does it take to model psychopathology items?. Psychological Methods, 8(2), 164- 184.
Ryan, J.& Brockmann, F. (2009). A Practitioner's Introduction to equating  with primers on classical test theory and item response theory. Washington: Council of Chief State School Officers (CCSSO).
Shyu, C. (2001). Estimating error indexes in estimating proficiencies and constructing confidence interval in item response Theory. (Unpublished doctoral dissertation), University of Lowa.
Song, T. (2009). Investigating different item response models in equating the examination for the certificate of proficiency in English. Spaan Fellow Papers in Second or Foreign Language Assessment, 7, 85- 98.
Stone, C. & Hansen, M. (2000). The Effect of Errors in Estimating
Ability on Goodness-of-Fit Tests For IRT Models. Educational and Psychological Measurement, 60(6), 974-991.
 Uysal, I. & Kilmen, S. ( 2016). Comparison of item response theory test equating method for mixed format tests. International Online Journal of Educational Sciences, 8(2), 1-11.
Von Davier, A. & Wilson, C. (2007). IRT True-Score test equating (A guide through assumption and applications). Educational and Psychological Measurement, 67(6), 940-957.
Wang, S. Zhang, M. & You, S. (2020). A Comparison of ITR observed score Kernel equating and several equating methods. Frontiers in Psychology, 11(308), 1-19.
Zhonghua, Z. (2010). Comparison of different equating methods and an application to link test- based tests. (Unpublished doctoral dissertation), the Chinese University of Hong Kong.