Pathways for instructionally embedded assessment (PIE) proof of concept: Potential future summative uses of an instructionally embedded assessment model

SPEAKERS PIE Proof of Concept Wednesday, June 25, 2025 |
12:15–1:00 p.m. Jake Thompson & Brooke Nash • Accessible Teaching, Learning, and Assessment Systems (ATLAS) Shaun Bates • Missouri Department of Elementary and Secondary Education

Pathways for Instructionally Embedded Assessment (PIE)

SESSION OBJECTIVES • Describe the benefits and utility of summative
achievement results based on instructionally embedded assessments. • Define potential roles and associated design considerations for an end- of-year component in an instructionally embedded assessment system. • List the inferences supported by different summative scoring models for an instructionally embedded assessment.

BACKGROUND • The Pathways for Instructionally Embedded Assessment (PIE) is
a CGSA funded project aimed at developing a proof-of-concept innovative assessment, piloted in classrooms during the 2024-2025 school year. • The overarching goal of the pilot study was to evaluate PIE assessment results for multiple potential purposes. The focus of this presentation is on how results from the instructionally embedded assessments can be used for summative purposes.

OVERVIEW OF THE PIE ASSESSMENT MODEL 1. Learning Pathways 2.
Instructionally Embedded Assessment Delivery 3. Actionable Results

EXAMPLE CONTENT GROUP WITH PATHWAY LEVELS

REPORTING • Results are reported as a mastery profile •
Summarizes KSUs mastered by the student during the instructionally embedded window

PIE THEORY OF ACTION

FROM EMBEDDED TO SUMMATIVE REPORTING • Result uses should be
consistent with the PIE Theory of Action • Mastery results provide instructionally useful information • Summative results reflect achievement of content standards • Embed assessments into instruction to measure skill/competency acquisition as it occurs, and then summarize that information • End-of-year assessments may be optionally included depending on specific claims of the assessment system

Summative Results From Embedded Assessments

MODELS UNDER CONSIDERATION • Traditional scale score model • Diagnostic
model • Hybrid model combining diagnostic and scale score features

... Scale Score Items TRADITIONAL SCALE SCORE MODEL

... ... ... Items Attributes Mastery Profile DIAGNOSTIC MODEL

... ... Scale Score ... Items Attributes Mastery Profile HYBRID
MODEL

MODELING OVERVIEW Scale Score Model Diagnostic Model Hybrid Model Advantages
• Widely used • Well tested • Familiar to stakeholders • Well tested • Instructionally-relevant grain-size • Consistent with embedded results • Supports both instructionally-relevant and overall results • Scale score can be incorporated into existing accountability systems Disadvantages • Inconsistent with embedded results across profiles • Not well-suited to instructional decisions • Unreliable subscores • Not easy to synthesize a whole profile (e.g., “is my student on track?”) • Unfamiliar to many stakeholders • Untested; requires research to understand and support intended uses

MODEL EVALUATION • Model fit for each model assessed using
posterior predictive model checks • Methodological details described in Thompson (2024) • Reliability of scale score or mastery classifications Thompson (2024)

DATA • Inclusion criteria: • Students must have completed at
least one content standard in the instructionally embedded window • 1,572 5th grade students in Missouri • 55 teachers from 28 districts and 32 schools • Students completed an average of 12 standards

RESULTS: ABSOLUTE FIT • All three models showed adequate model
fit (i.e., ppp > .05) • Traditional scale score model (2PL/GRM) and hybrid model (Beta IRT) showed good recovery of the student raw score distribution • Diagnostic model show adequate model fit for the majority of models • 25 estimated diagnostic models (1 per content standard) • 21 demonstrated adequate model fit

• Both traditional scale score and hybrid model showed good
reliability with low standard errors of measurement • Hybrid model more consistent over the range of the latent trait • All diagnostic models showed high levels of classification accuracy and consistency RESULTS: RELIABILITY

CONCLUSIONS • Based on these results all three models met
evaluation standards for technical adequacy • Sufficient levels of both model fit and reliability • Implementation should be driven by consistency with theory of action and stakeholder needs

RECOMMENDATIONS FOR FUTURE IMPLEMENTATION Claim Scale Score Model Diagnostic Model
Hybrid Model I: Mastery results represent what students know and can do relative to the learning pathways. Not supported Results reported directly as the set of mastery KSUs Mastery results directly inform summative scale score K: Summative results accurately reflect student achievement of grade- level academic content standards. Supported with a single scale score Supported with a profile of mastered KSUs Supported with both scale score and diagnostic profile L: Educators make instructional decisions based on data from the PIE assessments. Not well suited to instructional decision- making Instructional decision- making based on mastery profile Instructional decision- making based on mastery profile M: Students make progress towards mastery of grade-level content standards. Supported with existing growth models Additional research needed to evaluate profile-based growth Supported with existing growth models Support for relevant claims in the Theory of Action provided by each scoring model:

ADDITIONAL CONSIDERATIONS • Findings indicate that instructionally embedded results can
"stand alone" to better meet stakeholder needs • Reduce end of year testing burden • Timely and instructionally relevant results • Summative results that align to existing accountability systems • Optional end-of-year testing could be administered as needed • May or may not be included in scoring model to inform results • Opportunity for students to test on missed content (e.g., moved schools) • Use matrix sampling to gauge where buildings or schools are at the end of the year

Discussion

• Define potential roles and associated design considerations for an
end- of-year component in an instructionally embedded assessment system • Missouri will continue to need a growth measure; with this model can we measure year-to-year growth and within-year growth of students. • Our design needed to be focused on the primary users of the system. DESE and LEAs want to support teachers, parents and the students through their learning. • Design considerations • How do we attempt to mitigate behavioral changes when a system becomes part of accountability? • How do we support our teachers and instructional pedagogies? • How do we support our transient population?

• Missouri is pursuing an IADA • Our focus is
supporting a competency-based model and traditional scope-and-sequence-based instruction • Scalability • Learning maps development • Funding

W. Jake Thompson & Brooke Nash ATLAS, University of Kansas
✉ [email protected] ✉ [email protected] https://pie.atlas4learning.org https://atlas.ku.edu atlas4learning Shaun Bates Missouri DESE ✉ [email protected] https://dese.mo.gov MOEducation GET IN TOUCH!

Don’t forget to log in the mobile app to complete
the session survey! Save the Date - #NCSA2026 THANK YOU Austin, Texas • June 22-24, 2026

Pathways for instructionally embedded assessmen...

Pathways for instructionally embedded assessment (PIE) proof of concept: Potential future summative uses of an instructionally embedded assessment model

Jake Thompson

More Decks by Jake Thompson

Other Decks in Research

Featured

Transcript

SPEAKERS PIE Proof of Concept Wednesday, June 25, 2025 |