Estimating Group × Time Interaction in Scale-Transformed CEFR-J Self-Assessment Scores: A Case in Study-Abroad Research

GloDAL / ALS-Methoken 2026 • The Hang Seng University of
Hong Kong • May 15–16, 2026 Estimating Group × Time Interaction in Scale- Transformed CEFR-J Self-Assessment Scores: A Case in Study-Abroad Research Ken Urano Hokkai-Gakuen University, Japan • [email protected] 1 / 14

C O N T E X T Research Context THREE-WEEK
PROGRAM ▸ Intensive English course ▸ EBP company visits ▸ Homestay immersion 2 × 2 MIXED DESIGN n = 12 Study Abroad n = 9 Comparison Primary parameter: Group × Time interaction (d) Listening • Interaction • Production Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 2 / 14

I N S T R U M E N T
CEFR-J Self-Assessment Pre-A1 A1.1 A1.2 A1.3 A2.1 A2.2 B1.1 B1.2 B2.1 B2.2 C1 C2 ← Beginner Advanced → 1 Two descriptors per level Each level has two can-do statements (C1 & C2 have one each) 2 5-point Likert scale 1 = cannot perform → 5 = fully able; level score = mean 3 Pre & post administration Same instrument for both groups; 3 domains assessed Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 3 / 14

I N S T R U M E N T
Sample CEFR-J Descriptors Listening (A1.2) I can understand short conversations about familiar topics (e.g., hobbies, sports, club activities), provided they are delivered in slow and clear speech. Interaction (A2.2) I can interact in predictable everyday situations (e.g., a post office, a station, a shop), using a wide range of words and expressions. Production (B2.1) I can develop an argument clearly in a debate by providing evidence, provided the topic is of personal interest. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 4 / 14 English descriptors: Negishi et al. (2013). One example per domain shown.

B A C K G R O U N D
The Ordinal Problem CEFR-J is ordinal Levels ranked Pre-A1 → C2, but intervals between adjacent levels are not formally defined. Common assumption Integer weights (1, 2, 3…) assigned and treated as interval — convenient but untested. The problem Transformation choice affects Cohen's d. Different choices can yield different conclusions. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 5 / 14

M E T H O D Three Scale Transformations Equal-interval
w = k Equal spacing assumed Assumes uniform intervals Square-root w = √k Diminishing returns at higher levels ← focus of this study Squared w = k² Accelerating gains at higher levels Amplifies upper levels Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 6 / 14

M E T H O D How Transformation Affects a
Score Level k Equal-interval (w = k) Sqrt (w = √k) Squared (w = k²) A2.1 5 5.0 2.24 25 B1.1 7 7.0 2.65 49 C1 11 11.0 3.32 121 Domain score = Σ [ mean(rating₁, rating₂) × weight(k) ] Example: A2.1 rated 3 & 4 → mean = 3.5 | Linear: 17.5 | Sqrt: 7.8 | Squared: 87.5 Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 7 / 14

M E T H O D Weighting Schemes: Conceptual Overview
Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 8 / 14

M E T H O D Effect Size: Formula &
Benchmarks Cohen's d (Group × Time interaction) d = [ (MSA,Post − MSA,Pre) − (MCG,Post − MCG,Pre) ] / SDpooled Small Medium Large Cohen (1988) 0.20 0.50 0.80 Plonsky & Oswald (2014) 0.40 0.70 1.00 This study applies Plonsky & Oswald (2014) benchmarks, calibrated for L2 research. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 9 / 14

R E S U L T S Effect Sizes Across
Transformations 0.59 0.32 0.88 0.72 0.46 0.97 0.43 0.13 0.75 -0.1 0.1 0.3 0.5 0.7 0.9 1.1 Listening Interaction Production Equal-interval Sqrt Squared Plonsky & Oswald size Listening Small–medium Interaction Small Production Medium–large Sqrt yielded the largest d; Squared the smallest. Production effect is medium-large across all transformations. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 10 / 14

R E S U L T S Optimal Weight Analysis
Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 11 / 14

D I S C U S S I O N
Methodological Implications ! Transformation is not neutral Each choice encodes assumptions about CEFR-J scale structure. ↑ Sqrt yielded the largest d Squared the smallest; Equal-interval intermediate — consistent across all domains. ↔ Direction was robust Study Abroad > Comparison across all transformations; magnitude varied. ▸ Domain-specific effects Production largest; Interaction smallest — consistent across all transformations. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 12 / 14

Summary RQ How do scale transformations affect Group × Time
effect sizes in CEFR-J data? Finding Sqrt yielded largest d; Squared smallest; Equal-interval intermediate. Production: d = 0.75–0.97. Interaction: d = 0.13–0.46. Recommendation Try multiple transformations and report each; direction consistent, magnitude varies. Next steps Larger N, alternative effect size measures (e.g., ε²), replication across institutions. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 13 / 14

References Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Lawrence Erlbaum. Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press. Council of Europe. (2020). Common European framework of reference for languages: Companion volume. Council of Europe Publishing. Negishi, M., Takada, T., & Tono, Y. (2013). A progress report on the development of the CEFR-J. In E. D. Galaczi & C. J. Weir (Eds.), Exploring language frameworks (pp. 135–163). Cambridge University Press. Plonsky, L., & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. https://doi.org/10.1111/lang.12079 Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong

Estimating Group × Time Interaction in Scale-Tr...

Estimating Group × Time Interaction in Scale-Transformed CEFR-J Self-Assessment Scores: A Case in Study-Abroad Research

Ken Urano

More Decks by Ken Urano

Other Decks in Education

Featured

Transcript

GloDAL / ALS-Methoken 2026 • The Hang Seng University of

C O N T E X T Research Context THREE-WEEK

I N S T R U M E N T

I N S T R U M E N T

B A C K G R O U N D

M E T H O D Three Scale Transformations Equal-interval

M E T H O D How Transformation Affects a

M E T H O D Weighting Schemes: Conceptual Overview

M E T H O D Effect Size: Formula &

R E S U L T S Effect Sizes Across

R E S U L T S Optimal Weight Analysis

D I S C U S S I O N

Summary RQ How do scale transformations affect Group × Time

References Cohen, J. (1988). Statistical power analysis for the behavioral