Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

The Fellowship of Trust in AI

The Fellowship of Trust in AI

Keynote at MODELS 2024. Linz Austria.
https://conf.researchr.org/home/models-2024

In the realm of software, an AI revolution is afoot, transforming how we create and consume our digital world. In this keynote, I shall share initial observations on the evolution of software engineering and its profound impact on developer productivity and experience. Like the forging of powerful artifacts, AI-driven tools are reshaping development processes, bringing unprecedented efficiencies yet also presenting new trials. Central to this grand transformation is the vital role of trust in AI-based software tools. Understanding and nurturing this trust is paramount for their successful adoption and integration. Moreover, I will reveal why the MODELS community stands as a pivotal fellowship in this epic journey, guiding us through the challenges and triumphs of the AI age. Join us as we embark on this transformative quest, bridging trust, innovation, and productivity in the dawn of AI and software engineering. (This text has been rephrased by the author using ChatGPT to reflect a different style while maintaining the original meaning and contents.)

Thomas Zimmermann

September 27, 2024
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. In the realm of software, an AI revolution is afoot,

    transforming how we create and consume our digital world.
  2. Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou,

    Travis Lowdermilk, and Idan Gazit. ACM Queue, Volume 20, Issue 6, November/December 2022, pp 35–57 Developers reported spending less time on Stack Overflow due to Copilot’s code suggestions. Developers' roles shifted from primarily writing code to reviewing and understanding code suggested by AI. Copilot opened new learning opportunities like mastering new programming languages. Developers' trust plays a crucial role in adoption, as any unexpected behavior can significantly impact its usage.
  3. Trust matters for tool adoption and tool usage So many

    tools... yet so few in use! Meanwhile, tools continue to emerge and evolve from traditional to AI-assisted tools Lack of trust in a tool can lead to suboptimal use and poor outcomes
  4. Designing AI systems for responsible trust is important “Overreliance on

    AI occurs when users start accepting incorrect AI outputs. This can lead to issues and errors… An important goal of AI system design is to empower users to develop appropriate reliance on AI. ” (Passi and Vorvoreanu, 2022)
  5. 15 Developers’ calibrated trust in AI is a prerequisite for

    their safe and effective use of AI tools. Lack of trust hinders adoption Blind trust leads to overlooking mistakes
  6. What factors influence developers' trust in software tools? Brittany Johnson,

    Christian Bird, Denae Ford, Nicole Forsgren, Thomas Zimmermann: Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools. ICSE-SEIP 2023: 409-419
  7. What factors influence developers’ trust in software tools? Interviews with

    18 practitioners Transcribe Code codebook (v1) Thematic Analysis ` ` ` ` ` ` P P Validate I I C C S S E E Define and discuss trust in tools and collaborators ... Validate Survey with 300+ practitioners
  8. P I C S E ersonal nteraction ontrol ystem xpectations

    Factors related to engagement with tool Intrinsic, extrinsic, and social factors Factors related to control over usage Properties of the tool before and during use Meeting expectations developers built What factors influence developers’ trust in software tools?
  9. P I C S E ersonal nteraction ontrol ystem xpectations

    Factors related to engagement with tool Intrinsic, extrinsic, and social factors Factors related to control over usage Properties of the tool before and during use Community Source reputation Clear advantages Validation support Feedback loops Educational value Ownership Autonomy Workflow integration Ease of installation Polished presentation Safe and secure Correctness Consistency Performance Meeting expectations Transparent data practices Style matching Goal matching Meeting expectations developers built What factors influence developers’ trust in software tools?
  10. P I C S E ersonal nteraction ontrol ystem xpectations

    Factors related to engagement with tool Intrinsic, extrinsic, and social factors Factors related to control over usage Properties of the tool before and during use Community Source reputation Clear advantages Validation support Feedback loops Educational value Ownership Autonomy Workflow integration Ease of installation Polished presentation Safe and secure Correctness Consistency Performance Meeting expectations Transparent data practices Style matching Goal matching Meeting expectations developers built What factors influence developers’ trust in software tools?
  11. Personal Intrinsic, extrinsic, and social factors Community There's an accessible

    community of developers that use the tool. "That's probably recommended because over the community that's how it's preferable. Then you're leaning towards more into the more community-wide practices." - Software Dev Engineer Lead "Even if I trust the brand, nobody else is on there... I wouldn't download the app, the social media. If there is no network, why would I use it?" - Software Engineer I C S E nteraction ontrol ystem xpectations
  12. Intrinsic, extrinsic, and social factors Source Reputation Personal Intrinsic, extrinsic,

    and social factors I C S E nteraction ontrol ystem xpectations The reputation of or familiarity with the individual, organization, or platform associated with introduction to the tool. "If a person that I personally trust a lot for example, a coworker that I work closely and that I have a lot of respect to, then, of course, that also carries weight. " - Software Engineer " I definitely get more excited about a Microsoft tool or product as opposed to a Google product or an Amazon product. " - Senior Software Engineer
  13. Clear Advantages Personal Intrinsic, extrinsic, and social factors I C

    S E nteraction ontrol ystem xpectations The ability to see the benefits of using the tool, typically from use and validation by others. "When I tuned into that, it was a combination of seeing that and seeing how powerful it was and how easy it was...What am I doing? This looks great." - Systems Engineer "...while I’m in that car and AI Is doing the right thing, I’ll see, it actually stopped the right car. It actually identified that someone crossing the road and all those small nitpick details. Then that trust will build up and I can rely on AI okay." - Software Dev Engineer Lead
  14. Source reputation (especially relevant for adoption) Introduce tools via trusted

    sources Requires knowledge of network (perhaps best for internal tools) Clear advantages (especially relevant for adoption) Provide tool demos and comparisons Create forums for showcasing new tools (particularly internally) Community (before and during use) Build and foster community around tool Make visible and accessible (common on GitHub) Using PICSE for building trust
  15. Generally, similar priorities for both: Consistency and meeting expectations important

    Interaction factors generally less important However, for AI-assisted tools, developers: Prioritize validation support, autonomy, and source reputation (who built it) Deprioritize factors like goal matching and style matching Our study found several similarities between developer trust in AI-assisted tools and trust in their collaborators! Are there differences between trust in traditional tools and AI-assisted tools?
  16. How can we design for trust in AI-powered code generation

    tools? Ruotong Wang, Ruijia Cheng, Denae Ford, Thomas Zimmermann: Investigating and Designing for Trust in AI-powered Code Generation Tools. FAccT 2024
  17. The MATCH model for responsible trust Designing for Responsible Trust

    in AI Systems: A Communication Perspective, Q. Vera Liao & S. Shyam Sundar, FAccT 2022 Trustworthiness of AI systems can be communicated via system design. (Liao and Sundar, FAccT 2022) Trustworthiness of AI systems can be communicated via system design. (Liao and Sundar, FAccT 2022)
  18. The MATCH model for responsible trust Designing for Responsible Trust

    in AI Systems: A Communication Perspective, Q. Vera Liao & S. Shyam Sundar, FAccT 2022 A trustworthiness cue is any information within a system that can cue, or contribute to, users’ trust judgements. Trust affordances are displayed properties of a system that engender trustworthiness cues. Trust heuristics are any rules of thumb applied by a user to associate a given cue with a judgment of trustworthiness. (Liao and Sundar, 2022)
  19. Research Questions What do developers need to build appropriate trust

    with AI-code generation tools? What challenges do developers face in the current trust-building process? How can we design UX enhancements to support users building appropriate trust? Study 1: Experience sampling + Debrief interview Study 2: Design probe + Interviews Understand notions of trust Explore potential design solutions
  20. Study 1: Experience sampling + Debrief interview Procedures A week

    of collecting significant moments of using Copilot via screenshots and short descriptions Prompt through Microsoft Teams: When you are appreciative of, frustrated by, or hesitant/uncertain to use the code generation tool Participants Randomly sampled 1500 internal developers + Interns Teams Channel 17 participants with various levels of programming experience and experience with AI-powered code generation tools Example of an experience entry
  21. Finding 1: Developers’ information needs in building appropriate trust Developers

    need to build reasonable expectations of the AI tool’s ability and risks to build appropriate trust • What benefits to expect when collaborating with AI • What use cases to use AI for • What are security and privacy implications of using AI “It comes back to learning what Copilot is suited for versus not suited for, just building the intuition. Once you have that intuition, you don’t put Copilot in to positions where you know it will fail...” (P13)
  22. Developers want information about to what extent and in what

    way they can control and set preference for... • What the AI produces • When and how AI steps in • What code context AI uses “I don’t want Copilot to give me anything unless I type trigger.... It’s too much. It started as a co-pilot, but now it’s the pilot and I’m becoming the co-pilot.” (P8) Finding 1: Developers’ information needs in building appropriate trust
  23. The evaluation of AI suggestions in each specific instance form

    the basis of developers’ trust perception to AI code generation tools. • How good the suggestion is • Why the suggestions are made Strategies to make sure that “the code is actually correct”: • Logically go through the problem • Validate by running the code • Write formal tests Finding 1: Developers’ information needs in building appropriate trust
  24. Expectation of AI’s ability and risks • What benefits to

    expect when collaborating with AI • What use cases to use AI for • What security and privacy implication the AI brings Ways to control AI • What the AI produces • When and how AI steps in • What code context AI uses Quality and reasons of AI suggestions • How good the suggestion is • Why the suggestions are made Finding 1: Developers’ information needs in building appropriate trust
  25. Finding 2: Challenges developers face in building appropriate trust Setting

    proper expectations • Bias from initial experience and experience with similar tools. • “It’s takes three good recommendations to build trust versus one bad recommendation to lose trust.” (P5) Controlling AI tools • Lack of guidance to harness AI • “I felt like a lot of the time I ended up just fighting it.” (P7) Inadequate support for evaluating individual AI suggestions • Lack of debugging support and cognitive load of reviewing • “The code reviews cost you more than the actually writing the code.” (P8)
  26. Study 2: Design probe + Interviews Procedures Using three design

    probes, interview developers about affordance and trustworthiness cues that support building appropriate trust 1. Control mechanisms to set preference 2. Explanation of suggestions 3. Feedback analytics Participants 12 internal and external developers with varied experience in code generation tools, work experience, roles in team
  27. Design recommendations for tool builders Empower users to build appropriate

    expectations by • Communicate the uses cases, and potential risks and benefits of the system • Design for evolving trust Offer affordances and guidance in customizing the system Provide signals for assessing quality of code suggestions
  28. How do online communities affect developers’ trust in AI-powered tools?

    Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, Denae Ford: “It would work for me too”: How Online Communities Shape Software Developers' Trust in AI- Powered Code Generation Tools. To appear in ACM Transactions on Interactive Intelligent Systems
  29. Why online communities? Yixuan Zhang, Nurul Suhaimi, Nutchanon Yongsatianchot, Joseph

    D Gaggiano, Miso Kim, Shivani A Patel, Yifan Sun, Stacy Marsella, Jacqueline Griffin, and Andrea G Parker. 2022. Shifting Trust: Examining How Trust and Distrust Emerge, Transform, and Collapse in COVID-19 Information Seeking. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 78, 1–21. https://doi.org/10.1145/3491102.3501889 Trust is shaped by people’s information-seeking and assessment practices through emerging information platforms.” (Zhang et al, 2022) Trust is shaped by people’s information-seeking and assessment practices through emerging information platforms.” (Zhang et al, 2022)
  30. Design Probe Develop mockup prototypes 11 developers Think out loud

    about prototypes; brainstorm new features Semi-structured interview 17 developer community participants Recruited for a mix of tools and platforms The role of online community in: ◦ Expectation on AI ◦ Use cases of AI ◦ Vulnerable situations w/ AI ◦ … How do online communities shape developers' trust in AI code generation tools? How can we design to facilitate trust building in AI using affordances of online communities? Research Questions
  31. The code has been posted by other programmers, people voted

    on it... If others have used the solution and it worked, it gives you a little more faith. When unsure about AI suggestions, users go to online communities for evaluation. Code solutions in online communities are deemed as trustworthy because of: • Transparent source • Explicit evaluation & triangulation • Credibility from identity signals “ Pathway #1: Community offers evaluation on AI suggestions
  32. Engagement with specific experience shared by others helps users to

    develop: • Reasonable expectation on AI capability • Strategies of when to trust AI • Empirical understanding of suggestions • Awareness of broader implication of AI generated code I read a bunch of what people think of the outcome... [It] helps me make my own perception of whether it is something that is useful for me or not. If everyone has bad experience in the use cases that I care about, I won't trust it at all. Otherwise, I can know where to be careful and what to avoid in the future. “ Pathway #2: Users learn from others’ experience with AI
  33. 52 I once saw an interesting Copilot suggestion and want

    to try it myself. But I couldn’t get it even with the same prompt. I don’t know what their setup is. “ Challenges in effectively using online communities Despite the benefits of sharing specific experience, user sharings lacks: • Project context & replicability • Effective description of interaction with AI • Diversity and relevance
  34. The extended MATCH model with communities Design #2: Community curated

    experience Design #1: Community evaluation signals Community sensemaking Collective heuristics Online communities
  35. Copilot Community Analytics 578 Code snippets similar to this have

    been suggested to users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See similar suggestions in Copilot Community Search code snippet in:
  36. Copilot Community Analytics 578 Code snippets similar to this have

    been suggested to users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See similar suggestions in Copilot Community Search code snippet in: Community usage statistics
  37. Copilot Community Analytics 578 Code snippets similar to this have

    been suggested to users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See similar suggestions in Copilot Community Search code snippet in: Community voting
  38. 578 Code snippets similar to this have been suggested to

    users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See user sharings in Community Search code snippet in: Copilot Community Analytics Identity/ Reputation signals
  39. Design Probes: User Feedback Community Statistics: Helpful, objective metrics for

    users to decide how to trust AI suggestion Need more scaffolds for interpreting the numbers, e.g., user intention and rationales Design 1: Introducing community evaluation signals to the AI code generation experience
  40. User Voting: Proactive way to indicate feedback Want to see

    the outcome of voting (e.g., customization) reflected in future AI suggestions Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience
  41. Identity Signals: Helpful to further interpret the statistics Want more

    relevance, e.g., expertise in specific tasks Need transparency on what data is collected and how the data will be used Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience
  42. Overall: Popup window can be distracting Need more seamless integration

    into programming workflow, e.g., preview and summary Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience
  43. 1 2 3 4 5 6 7 8 9 10

    11 12 13 14 15 16 17 'use strict'; // Increase max listeners for event emitters require('events').EventEmitter.defaultMaxListeners = 100; const gulp = require('gulp'); const util = require('./build/lib/util'); const path = require('path'); const compilation = require('./build/lib/compilation'); // Fast compile for development time gulp.task('clean-client', util.rimraf('out')); gulp.task('compile-client', ['clean-client'], compilation.compileTask('out', false)); gulp.task('watch-client', ['clean-client'], compilation.watchTask('out', false)); // Full compile, including nls and inline sources in sourcemaps, for build gulp.task('clean-client-build', util.rimraf('out-build')); gulp.task('compile-client-build', ['clean-client-build'], compilation.compileTask('out-build', true)); gulp.task('watch-client-build', ['clean-client-build'], compilation.watchTask('out-build', true)); 12 4 Explore Copilot Community 8 2 3 3 COPILOT COMMUNITY Similar Experiences See how others in your organization interact with Copilot when getting similar suggestions as the one you got just now. Copilot Auto Recording:
  44. 1 2 3 4 5 6 7 8 9 10

    11 12 13 14 15 16 17 'use strict'; // Increase max listeners for event emitters require('events').EventEmitter.defaultMaxListeners = 100; const gulp = require('gulp'); const util = require('./build/lib/util'); const path = require('path'); const compilation = require('./build/lib/compilation'); // Fast compile for development time gulp.task('clean-client', util.rimraf('out')); gulp.task('compile-client', ['clean-client'], compilation.compileTask('out', false)); gulp.task('watch-client', ['clean-client'], compilation.watchTask('out', false)); // Full compile, including nls and inline sources in sourcemaps, for build gulp.task('clean-client-build', util.rimraf('out-build')); gulp.task('compile-client-build', ['clean-client-build'], compilation.compileTask('out-build', true)); gulp.task('watch-client-build', ['clean-client-build'], compilation.watchTask('out-build', true)); 12 4 Explore Copilot Community 8 2 3 3 COPILOT COMMUNITY Similar Experiences See how others in your organization interact with Copilot when getting similar suggestions as the one you got just now. Post Interaction Snippet to Copilot Community Add comments Add title Post Allow link to GitHub Project Tags: JavaScript Add tags Edit Snippet Video: Current length: 30s Save as private Share outside your organization Copilot Auto Recording:
  45. Copilot Community Discovery Interesting suggestion by Copilot in TS 3

    3 156 Views | 1 hour ago My review of Copilot for Ruby 97 Views | 3 hours ago 7 5 Tricks to prompt Copilot 97 Views | 3 hours ago 32 15 Using Copilot to implement a Web App 97 Views | 3 hours ago 7 9 Tut Cop 97 Vi Similar to Your Experiences Interesting suggestion by Copilot in TS 3 3 My review of Copilot for Ruby 3 3 Tricks to prompt Copilot 3 3 Using Copilot to implement a Web App 3 3 Tut Cop Search New Top: Forked Language Sentiment Topics My likes | My GitHub See how others in your organization interact with Copilot when getting similar suggestions as the ones you have gotten.
  46. IDE side panel: Expansion on the community statistics Time consuming

    to watch the videos within a programming session Need a more efficient way to present AI interaction, e.g., code snippets linked to project Assurance of confidentiality in sharing Design Probes: User Feedback Design 2: Developer community dedicated to specific experience with the AI code generation tool
  47. External community: Great for discovery and learning outside programming workflow

    Need more enriched content than a screen recording video, e.g., voice over, text-based tutorial More lightweight options to replication Design Probes: User Feedback Design 2: Developer community dedicated to specific experience with the AI code generation tool
  48. Design Recommendations Dedicated user communities can help developers understand, adopt,

    and develop appropriate trust with code generation AI. The user community should offer: • Scaffolds to share specific, authentic experience with AI • Integration into users' workflow. • Assistance to effectively utilize community content • Assurance for privacy and confidentiality
  49. MODELS is a pivotal fellowship in this epic journey, guiding

    us through the challenges and triumphs of the AI age.
  50. #1 AI for the entire software lifecycle GitHub Copilot was

    focused on code editing within the IDE. Software creation is more than writing code. Huge opportunity to apply AI to the entire software lifecycle, including modeling of software. The ultimate “shift left”? (AI for MODELS)
  51. #2 Help people build AI-powered software Future software will be

    AI-powered (“AIware”). How can we model, build, test, and deploy these AIware systems in a scalable and in a disciplined way? Important to avoid “AI debt”. How can we model the architecture of AIware systems. Explainability, Validation, and Verification of AIware systems. (MODELS for AI)
  52. #3 Provide great human-AI interaction Important to figure out and

    model how humans will interact with AI system. Design an experience that makes the interaction seamless. Consider HCI from the beginning. Systems that adapt and respond dynamically to user preferences.
  53. #4 Leverage AI for software science Huge potential for AI

    to be used in research design, data analysis. Great brainstorming partner. But keep in mind: AI isn't perfect, so people need to vet suggestions. Role of research is changing given the rapid speed of innovation. The output and artifacts of the scientific process are changing. Can we apply model-driven techniques?
  54. Can GPT-4 Summarize Papers as Cartoons? Yes! :-) Can GPT-4

    Replicate Empirical Software Engineering Research? Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae Ford, Nicole Forsgren, Thomas Zimmermann PACMSE (FSE) 2024. AI-generated images may be incorrect. None of the authors wore a lab coat during this research. :-)
  55. #5 Apply AI in a responsible way How to design

    and build software systems using AI in a responsible, ethical way that users can trust and do not negatively affect society? What mechanism and regulations do we need to oversee AI systems? How can we model and verify AI governance and compliance? How about societal impacts, ethical considerations, and human factors?