Pathways for instructionally embedded assessment (PIE) proof of concept: Potential future summative uses of an instructionally embedded assessment model
In this session, we will share information about the test design options and innovative scoring models to support a through-year assessment model that supports both finer-grained information to guide instruction and summative achievement results.
12:15–1:00 p.m. Jake Thompson & Brooke Nash • Accessible Teaching, Learning, and Assessment Systems (ATLAS) Shaun Bates • Missouri Department of Elementary and Secondary Education
achievement results based on instructionally embedded assessments. • Define potential roles and associated design considerations for an end- of-year component in an instructionally embedded assessment system. • List the inferences supported by different summative scoring models for an instructionally embedded assessment.
a CGSA funded project aimed at developing a proof-of-concept innovative assessment, piloted in classrooms during the 2024-2025 school year. • The overarching goal of the pilot study was to evaluate PIE assessment results for multiple potential purposes. The focus of this presentation is on how results from the instructionally embedded assessments can be used for summative purposes.
consistent with the PIE Theory of Action • Mastery results provide instructionally useful information • Summative results reflect achievement of content standards • Embed assessments into instruction to measure skill/competency acquisition as it occurs, and then summarize that information • End-of-year assessments may be optionally included depending on specific claims of the assessment system
• Widely used • Well tested • Familiar to stakeholders • Well tested • Instructionally-relevant grain-size • Consistent with embedded results • Supports both instructionally-relevant and overall results • Scale score can be incorporated into existing accountability systems Disadvantages • Inconsistent with embedded results across profiles • Not well-suited to instructional decisions • Unreliable subscores • Not easy to synthesize a whole profile (e.g., “is my student on track?”) • Unfamiliar to many stakeholders • Untested; requires research to understand and support intended uses
posterior predictive model checks • Methodological details described in Thompson (2024) • Reliability of scale score or mastery classifications Thompson (2024)
least one content standard in the instructionally embedded window • 1,572 5th grade students in Missouri • 55 teachers from 28 districts and 32 schools • Students completed an average of 12 standards
fit (i.e., ppp > .05) • Traditional scale score model (2PL/GRM) and hybrid model (Beta IRT) showed good recovery of the student raw score distribution • Diagnostic model show adequate model fit for the majority of models • 25 estimated diagnostic models (1 per content standard) • 21 demonstrated adequate model fit
reliability with low standard errors of measurement • Hybrid model more consistent over the range of the latent trait • All diagnostic models showed high levels of classification accuracy and consistency RESULTS: RELIABILITY
evaluation standards for technical adequacy • Sufficient levels of both model fit and reliability • Implementation should be driven by consistency with theory of action and stakeholder needs
Hybrid Model I: Mastery results represent what students know and can do relative to the learning pathways. Not supported Results reported directly as the set of mastery KSUs Mastery results directly inform summative scale score K: Summative results accurately reflect student achievement of grade- level academic content standards. Supported with a single scale score Supported with a profile of mastered KSUs Supported with both scale score and diagnostic profile L: Educators make instructional decisions based on data from the PIE assessments. Not well suited to instructional decision- making Instructional decision- making based on mastery profile Instructional decision- making based on mastery profile M: Students make progress towards mastery of grade-level content standards. Supported with existing growth models Additional research needed to evaluate profile-based growth Supported with existing growth models Support for relevant claims in the Theory of Action provided by each scoring model:
"stand alone" to better meet stakeholder needs • Reduce end of year testing burden • Timely and instructionally relevant results • Summative results that align to existing accountability systems • Optional end-of-year testing could be administered as needed • May or may not be included in scoring model to inform results • Opportunity for students to test on missed content (e.g., moved schools) • Use matrix sampling to gauge where buildings or schools are at the end of the year
end- of-year component in an instructionally embedded assessment system • Missouri will continue to need a growth measure; with this model can we measure year-to-year growth and within-year growth of students. • Our design needed to be focused on the primary users of the system. DESE and LEAs want to support teachers, parents and the students through their learning. • Design considerations • How do we attempt to mitigate behavioral changes when a system becomes part of accountability? • How do we support our teachers and instructional pedagogies? • How do we support our transient population?