Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pathways for instructionally embedded assessmen...

Pathways for instructionally embedded assessment (PIE) proof of concept: Potential future summative uses of an instructionally embedded assessment model

In this session, we will share information about the test design options and innovative scoring models to support a through-year assessment model that supports both finer-grained information to guide instruction and summative achievement results.

Avatar for Jake Thompson

Jake Thompson

June 25, 2025
Tweet

More Decks by Jake Thompson

Other Decks in Research

Transcript

  1. SPEAKERS PIE Proof of Concept Wednesday, June 25, 2025 |

    12:15–1:00 p.m. Jake Thompson & Brooke Nash • Accessible Teaching, Learning, and Assessment Systems (ATLAS) Shaun Bates • Missouri Department of Elementary and Secondary Education
  2. SESSION OBJECTIVES • Describe the benefits and utility of summative

    achievement results based on instructionally embedded assessments. • Define potential roles and associated design considerations for an end- of-year component in an instructionally embedded assessment system. • List the inferences supported by different summative scoring models for an instructionally embedded assessment.
  3. BACKGROUND • The Pathways for Instructionally Embedded Assessment (PIE) is

    a CGSA funded project aimed at developing a proof-of-concept innovative assessment, piloted in classrooms during the 2024-2025 school year. • The overarching goal of the pilot study was to evaluate PIE assessment results for multiple potential purposes. The focus of this presentation is on how results from the instructionally embedded assessments can be used for summative purposes.
  4. OVERVIEW OF THE PIE ASSESSMENT MODEL 1. Learning Pathways 2.

    Instructionally Embedded Assessment Delivery 3. Actionable Results
  5. REPORTING • Results are reported as a mastery profile •

    Summarizes KSUs mastered by the student during the instructionally embedded window
  6. FROM EMBEDDED TO SUMMATIVE REPORTING • Result uses should be

    consistent with the PIE Theory of Action • Mastery results provide instructionally useful information • Summative results reflect achievement of content standards • Embed assessments into instruction to measure skill/competency acquisition as it occurs, and then summarize that information • End-of-year assessments may be optionally included depending on specific claims of the assessment system
  7. MODELS UNDER CONSIDERATION • Traditional scale score model • Diagnostic

    model • Hybrid model combining diagnostic and scale score features
  8. MODELING OVERVIEW Scale Score Model Diagnostic Model Hybrid Model Advantages

    • Widely used • Well tested • Familiar to stakeholders • Well tested • Instructionally-relevant grain-size • Consistent with embedded results • Supports both instructionally-relevant and overall results • Scale score can be incorporated into existing accountability systems Disadvantages • Inconsistent with embedded results across profiles • Not well-suited to instructional decisions • Unreliable subscores • Not easy to synthesize a whole profile (e.g., “is my student on track?”) • Unfamiliar to many stakeholders • Untested; requires research to understand and support intended uses
  9. MODEL EVALUATION • Model fit for each model assessed using

    posterior predictive model checks • Methodological details described in Thompson (2024) • Reliability of scale score or mastery classifications Thompson (2024)
  10. DATA • Inclusion criteria: • Students must have completed at

    least one content standard in the instructionally embedded window • 1,572 5th grade students in Missouri • 55 teachers from 28 districts and 32 schools • Students completed an average of 12 standards
  11. RESULTS: ABSOLUTE FIT • All three models showed adequate model

    fit (i.e., ppp > .05) • Traditional scale score model (2PL/GRM) and hybrid model (Beta IRT) showed good recovery of the student raw score distribution • Diagnostic model show adequate model fit for the majority of models • 25 estimated diagnostic models (1 per content standard) • 21 demonstrated adequate model fit
  12. • Both traditional scale score and hybrid model showed good

    reliability with low standard errors of measurement • Hybrid model more consistent over the range of the latent trait • All diagnostic models showed high levels of classification accuracy and consistency RESULTS: RELIABILITY
  13. CONCLUSIONS • Based on these results all three models met

    evaluation standards for technical adequacy • Sufficient levels of both model fit and reliability • Implementation should be driven by consistency with theory of action and stakeholder needs
  14. RECOMMENDATIONS FOR FUTURE IMPLEMENTATION Claim Scale Score Model Diagnostic Model

    Hybrid Model I: Mastery results represent what students know and can do relative to the learning pathways. Not supported Results reported directly as the set of mastery KSUs Mastery results directly inform summative scale score K: Summative results accurately reflect student achievement of grade- level academic content standards. Supported with a single scale score Supported with a profile of mastered KSUs Supported with both scale score and diagnostic profile L: Educators make instructional decisions based on data from the PIE assessments. Not well suited to instructional decision- making Instructional decision- making based on mastery profile Instructional decision- making based on mastery profile M: Students make progress towards mastery of grade-level content standards. Supported with existing growth models Additional research needed to evaluate profile-based growth Supported with existing growth models Support for relevant claims in the Theory of Action provided by each scoring model:
  15. ADDITIONAL CONSIDERATIONS • Findings indicate that instructionally embedded results can

    "stand alone" to better meet stakeholder needs • Reduce end of year testing burden • Timely and instructionally relevant results • Summative results that align to existing accountability systems • Optional end-of-year testing could be administered as needed • May or may not be included in scoring model to inform results • Opportunity for students to test on missed content (e.g., moved schools) • Use matrix sampling to gauge where buildings or schools are at the end of the year
  16. • Define potential roles and associated design considerations for an

    end- of-year component in an instructionally embedded assessment system • Missouri will continue to need a growth measure; with this model can we measure year-to-year growth and within-year growth of students. • Our design needed to be focused on the primary users of the system. DESE and LEAs want to support teachers, parents and the students through their learning. • Design considerations • How do we attempt to mitigate behavioral changes when a system becomes part of accountability? • How do we support our teachers and instructional pedagogies? • How do we support our transient population?
  17. • Missouri is pursuing an IADA • Our focus is

    supporting a competency-based model and traditional scope-and-sequence-based instruction • Scalability • Learning maps development • Funding
  18. W. Jake Thompson & Brooke Nash ATLAS, University of Kansas

    [email protected][email protected] https://pie.atlas4learning.org https://atlas.ku.edu atlas4learning Shaun Bates Missouri DESE ✉ [email protected] https://dese.mo.gov MOEducation GET IN TOUCH!
  19. Don’t forget to log in the mobile app to complete

    the session survey! Save the Date - #NCSA2026 THANK YOU Austin, Texas • June 22-24, 2026