$30 off During Our Annual Pro Sale. View Details »

PyLadiesCon 2025: “Go Straight, Then Turn Right...

Avatar for komo_fr komo_fr
December 05, 2025

PyLadiesCon 2025: “Go Straight, Then Turn Right!”: How I Built a Voice-Controlled Toy Car Using Generative AI, Gradio, and Raspberry Pi Pico

This slide deck is from my PyLadiesCon 2025 talk,
“Go Straight, Then Turn Right!”: How I Built a Voice-Controlled Toy Car Using Generative AI, Gradio, and Raspberry Pi Pico

I introduce a small but fun project where a toy car moves based on voice or image instructions, powered by Python, multimodal inputs, and a Raspberry Pi Pico.
The slides are bilingual (Japanese and English).

Session Page:
https://2025.conference.pyladies.com/en/session/go-straight-then-turn-right-how-i-built-a-voice-controlled-toy-car-using-generative-ai-gradio-and-raspberry-pi-pico/

GitHub:
https://github.com/komo-fr/llm-pico-car

X(Twitter):
https://x.com/komo_fr

Avatar for komo_fr

komo_fr

December 05, 2025
Tweet

More Decks by komo_fr

Other Decks in Programming

Transcript

  1. w 5PNPLP'VSVLJ w 9ʢ5XJUUFSʣ!LPNP@GS w ೔ຊ🇯🇵ࡏॅ w ॴଐגࣜձࣾϏʔϓϥ΢υIUUQTXXXCFQSPVEKQ w ීஈ͸1ZUIPOΛ࢖ͬͨσʔλαΠΤϯε෼໺ͷٕज़ࢧԉ΍ɺγεςϜ։ൃ

    ͳͲʹैࣄ *MJWFJO+BQBO *XPSLBU#F1SPVE*OD *XPSLPOEBUBTDJFODFQSPKFDUTBOETZTUFNEFWFMPQNFOU NBJOMZVTJOH1ZUIPO ࣗݾ঺հ 4FMGJOUSPEVDUJPO
  2. w --.1JDP$BS w ੜ੒"*Λར༻ͨ͠ɺԻ੠΍ը૾Ͱಈ͓͘΋ͪΌͷं w ͲΜͳϥΠϒϥϦ΍ٕज़Λ࢖ͬͨͷ͔ w ࡞Δ͏͑Ͱ޻෉ͨ͠఺ 👉ੜ੒"*ͱ*P5Λ૊Έ߹ΘͤΔΞΠσΞͷώϯτʹͳΕ͹͏Ε͍͠Ͱ͢ A

    toy car that moves using voice and images powered by generative AI ࠓ೔࿩͢͜ͱ 5PEBZTUBML What libraries and technologies I used (libraries and code) What I improved or tried to make unique I hope this gives you some ideas for combining generative AI and IoT
  3. w ԿΛ࡞ͬͨͷ͔ w Ͳ͏΍ͬͯ࡞ͬͨͷ͔ w 4UFQ8FCΞϓϦ͔ΒԻ੠Λೖྗ w 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ w

    4UFQίϚϯυΛड৴ ࣮ߦ w ݁Ռ w Ԡ༻ը૾ʹΑΔࢦࣔ w ·ͱΊ 8IBUEJE*NBLF )PXEJE*NBLFJU 3FTVMU Ξ΢τϥΠϯ 0VUMJOF "QQMJDBUJPO*NBHF#BTFE$PNNBOET 4VNNBSZ 4UFQ*OQVUWPJDFGSPNBXFCBQQ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSUUPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"* 4UFQ3FDFJWFBOEFYFDVUFDPNNBOET
  4. w ੠΍ը૾ʹΑΔࢦࣔͰಈ͓͘΋ͪΌͷं w ʮ੠ͰখܕΧʔʹࢦࣔΛग़ͤͨΒɺϖοτΆͯ͘ՄѪ͍ͷͰ͸ʁʯͱ͍͏Ξ ΠσΞ͔Βੜ·Εͨ ·͙ͬ͢ߦͬͯӈʂ གྷͬͯΈͯʂ A toy car

    that moves by voice or image. This idea came from a thought — “Wouldn’t it be cute if a toy car could follow voice commands like a pet?” Go straight and turn right! Dance! --.1JDP$BS ԿΛ࡞ͬͨͷ͔ 8IBUEJE*NBLF
  5. w ੜ੒"*Λ࢖͑͹ɺࣗવݴޠˠߏ଄Խ͞Εͨࢦࣔσʔλʹ؆୯ʹม׵Ͱ͖ͦ͏ w ϓϩϯϓτͷΠϝʔδʮࣗવݴޠʹΑΔࢦࣔΛɺࣙॻͷϦετܗࣜͰίϚϯυ ʹม׵͠ͳ͍͞ʯ ·͙ͬ͢ߦͬͯ ӈʹۂ͕ͬͯʂ [ { "action":

    { "type": "forward", "distance_cm": 20,}}, { "action": { "type": "turn", "direction": "right", "angle": 90,}}, ...] ࣗવݴޠ ߏ଄Խ͞ΕͨࢦࣔσʔλʢࣙॻͷϦετʣ લʹDNਐΉ ӈʹ౓ճస Using generative AI, natural language can be easily converted into structured actions. Prompt example: “Convert natural-language instructions into a list of action dictionaries.” ੜ੒"* Generative AI Natural Language Go straight and turn right! Structured actions (list of dictionaries) Move forward 20 cm Turn right 90 degrees ΞΠσΞ *EFB ԿΛ࡞ͬͨͷ͔ 8IBUEJE*NBLF
  6. w ʮ·͙ͬ͢ߦͬͯɺӈʹͪΐͬͱਐΜͰʯ w ʮགྷͬͯΈͯʯ Go straight and turn right! Dance!

    %&.0--.1JDP$BS ԿΛ࡞ͬͨͷ͔ 8IBUEJE*NBLF IUUQTXXXZPVUVCFDPNXBUDI W654CWVH*
  7. w ,JUSPOJL3BTQCFSSZ1J1JDP༻ࣗ૸ϩϘοτϓϥοτϑΥʔϜʢόΪʔλΠϓʣ ࠓճ͸ະ࢖༻͕ͩɺ 
 ڑ཭ηϯααʔϘίωΫλ 
 ϥΠϯτϨʔε༻ηϯα΋༗ ंྠ෇͖Ϟʔλʢݸʣ -&% ϒβʔ

    -&% Kitronik Autonomous Robotics Platform (Buggy) for Pico IUUQTLJUSPOJLDPVLQSPEVDUTBVUPOPNPVTSPCPUJDTQMBUGPSNGPSQJDP Motors with wheels (×2) Buzzer Not used in this project, but distance sensor, servo connectors, and line following sensor are also available. ࠓճ࢖ͬͨػମʢ1JDP$BSʣ )BSEXBSFVTFEJOUIJTQSPKFDU 1JDP$BS Ͳ͏࡞ͬͨͷ͔ )PXEJE*NBLFJU
  8. w ిࢠ޻࡞ͷܦݧ͕͋·Γͳ͍ͷͰɺࢢൢͷΩοτΛ࢖͍͍ͨ w Ϧν΢ϜΠΦϯి஑͸؅ཧ͕େมͦ͏ͳͷͰɺסి஑Ͱಈ͔͍ͨ͠ 👉3BTQCFSZ1J1JDP סి஑Ͱಈ͘ຊΩοτʹͨ͠ Since I don’t have

    much experience with electronics, I wanted to use a ready-made kit. Lithium-ion batteries seem difficult to manage, so I wanted to run it with regular batteries. So I chose this kit that works with Raspberry Pi Pico and runs on regular batteries. બఆͷཧ༝ 8IZ*DIPTFUIJTLJU Ͳ͏࡞ͬͨͷ͔ )PXEJE*NBLFJU
  9. 8FCΞϓϦ͔Β Ի੠Λೖྗ ίϚϯυΛड৴  ࣮ߦ Ի੠Λจࣈى͜͠  ੜ੒"*Ͱࢦࣔσʔλʹม׵ Ի੠σʔλ ߏ଄Խ͞Εͨ

    ࢦࣔσʔλ 4UFQ 4UFQ 4UFQ Input voice from a web app Web browser on a smartphone Transcribe voice and convert to structured actions using generative AI structured actions Receive and execute commands audio data શମͷߏ੒ 0WFSBMMBSDIJUFDUVSF Ͳ͏࡞ͬͨͷ͔ )PXEJE*NBLFJU εϚϗͳͲͷ 8FCϒϥ΢β 1JDP$POUSPMMFS4FSWFS ʢ.BDʣ 1JDP$BS4FSWFS ʢ3BTQCFSSZ1J1*DP$BSʣ
  10. w 1JDPΧʔ୯ମͩͱԻ੠ೝࣝ΍ੜ੒"*Λ࢖͏ͷ͕೉͍͠ 
 ʢॲཧ͕ॏ͍ɺ.JDSP1ZUIPOͩͱҰ෦ϥΠϒϥϦΛ࢖͑ͳ͍ʣ 
 👉Ի੠ೝࣝ΍ੜ੒"*Λ࢖͏ॲཧ͸ɺ.BD্Ͱಈ͘8FCαʔόʹ೚ͤͨ 
 1JDPΧʔͰ͸Ϟʔλ΍-&%ͷ੍ޚ͚ͩΛߦ͏ w ʮ1$ʹ࿩͔͚͠Δʯܗͩͱʮ1JDPΧʔʹ࿩͔͚͍ͯ͠Δʯײ͕ബ͘ͳΔ

    
 👉8FCϒϥ΢βܦ༝ͷ6*ʹͯ͠ɺεϚϗ͔Β΋ૢ࡞Ͱ͖ΔΑ͏ʹͨ͠ 
 ʢτϥϯγʔόʔతͳΠϝʔδͰʣ It’s difficult to run speech recognition or generative AI directly on the Pico car. (processing is heavy, and some libraries are not available in MicroPython) So, I offloaded the speech recognition and generative AI tasks to a web server running on my Mac. The Pico car handles only motor and LED control. If I only talk to the PC, it doesn’t feel like I’m actually talking to the Pico car. So I created a web-based UI that can be operated from a smartphone — like using a walkie-talkie to talk to the car. ߏ੒ͷཧ༝ 3FBTPOGPSUIJTBSDIJUFDUVSF Ͳ͏࡞ͬͨͷ͔ )PXEJE*NBLFJU
  11. εϚϗͳͲͷ 8FCϒϥ΢β 1JDP$POUSPMMFS4FSWFS ʢ.BDʣ 1JDP$BS4FSWFS ʢ3BTQCFSSZ1J1*DP$BSʣ 8FCΞϓϦ͔Β Ի੠Λೖྗ ίϚϯυΛड৴ 

    ࣮ߦ Ի੠Λจࣈى͜͠  ੜ੒"*Ͱࢦࣔσʔλʹม׵ Ի੠σʔλ ߏ଄Խ͞Εͨ ࢦࣔσʔλ 4UFQ 4UFQ 4UFQ Input voice from a web app Web browser on a smartphone Transcribe voice and convert to structured actions using generative AI structured actions Receive and execute commands audio data 4UFQ8FCΞϓϦ͔ΒԻ੠Λೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  12. w (SBEJPͱ͸ w ػցֶशͷσϞΞϓϦͳͲΛ؆୯ʹ࡞ΕΔ8FCΞϓϦϑϨʔϜϫʔΫ w Ի੠ೖྗ΍ը૾ೖྗͳͲͷ෦඼͕ɺ͋Β͔͡Ί༻ҙ͞Ε͍ͯΔ 👉ϑϩϯτΤϯυͷ஌͕ࣝͳͯ͘΋؆୯ʹը໘Λ࡞ΕΔ What is Gradio?

    A web app framework that makes it easy to build demo apps for machine learning. It provides built-in components for voice input, image input, and more. It provides built-in components for voice input, image input, and more. (SBEJPͰ8FCΞϓϦΛ࡞੒ $SFBUJOHBXFCBQQXJUI(SBEJP 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  13. Ի੠ೖྗίϯϙʔωϯτ ૹ৴ϘλϯϦηοτϘλϯ จࣈى͜͠ͷ݁ՌΛදࣔ͢Δ ςΩετϘοΫε ݱࡏͷεςʔλεΛදࣔ͢Δ ςΩετϘοΫε ߏ଄Խ͞Εͨࢦࣔσʔλ ʢࣙॻͷϦετʣΛ දࣔ͢ΔΤϦΞ Voice

    input component Send button / Reset button Area displaying structured actions (list of dictionaries) Textbox displaying the transcription result Textbox displaying the current status ਐḿϥϯϓ Proggress indicator lights ࠓճ࡞ͬͨ6* 6*DSFBUFEGPSUIJTQSPKFDU 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  14. 4UFQ*OQVUWPJDFGSPNBXFCBQQ with gr.Blocks() as demo: ... ུ omitted ... with

    gr.Tab("Ի੠ࢦࣔ"): # ෦඼ͷఆٛ / Define components audio_input = gr.Audio( sources=["microphone"], type="filepath", format="wav") with gr.Row(): audio_send_button = gr.Button("ૹ৴", variant="primary") audio_reset_button = gr.Button("Ϧηοτ", variant="secondary") audio_transcript_output = gr.Textbox(label="จࣈى݁͜͠Ռ") audio_status_output = gr.Textbox(label="ૹ৴εςʔλε") audio_commands_output = gr.JSON(label="ੜ੒͞ΕͨίϚϯυ") # ΫϦοΫΠϕϯτͷઃఆ / Define click event handlers ... ུ omitted ... # αʔόΛىಈ / Launch the server demo.launch(server_name="0.0.0.0", share=True) HS"VEJP HS#VUUPO HS#VUUPO HS5FYUCPY HS5FYUCPY HS+40/ Place a component for voice input ίʔυ $PEF 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ
  15. 4UFQ*OQVUWPJDFGSPNBXFCBQQ with gr.Blocks() as demo: ... ུ omitted ... with

    gr.Tab("Ի੠ࢦࣔ"): # ෦඼ͷఆٛ / Define components audio_input = gr.Audio( sources=["microphone"], type="filepath", format="wav") with gr.Row(): audio_send_button = gr.Button("ૹ৴", variant="primary") audio_reset_button = gr.Button("Ϧηοτ", variant="secondary") audio_transcript_output = gr.Textbox(label="จࣈى݁͜͠Ռ") audio_status_output = gr.Textbox(label="ૹ৴εςʔλε") audio_commands_output = gr.JSON(label="ੜ੒͞ΕͨίϚϯυ") # ΫϦοΫΠϕϯτͷઃఆ / Define click event handlers ... ུ omitted ... # αʔόΛىಈ / Launch the server demo.launch(server_name="0.0.0.0", share=True) HS"VEJP HS#VUUPO HS#VUUPO HS5FYUCPY HS5FYUCPY HS+40/ Ի੠ೖྗ༻ ίϯϙʔωϯτΛ഑ஔ ϘλϯΛ഑ஔ ݁Ռग़ྗ༻ͷ ίϯϙʔωϯτΛ഑ஔ Place a component for voice input Place buttons Place components for displaying results ίʔυʢϨΠΞ΢τͷఆٛʣ $PEF -BZPVUEF fi OJUJPOT 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ
  16. with gr.Blocks() as demo: ... ུ omitted ... with gr.Tab("Ի੠ࢦࣔ"):

    # ෦඼ͷఆٛ / Define components ... ུ omitted ... # ΫϦοΫΠϕϯτͷઃఆ # Define click event handlers audio_send_button.click(handler_audio_input, inputs=audio_input, outputs=[audio_transcript_output, audio_commands_output, audio_status_output, progress_lamps,],) audio_reset_button.click( ... ུ omitted ...) audio_input.start_recording(... ུ omitted ...) audio_input.stop_recording( ... ུ omitted ...) # αʔόΛىಈ / Launch the server demo.launch(server_name="0.0.0.0", share=True) Function called when the button is clicked ೖྗͱͯ͠࢖͏ίϯϙʔωϯτ ग़ྗͱͯ͠࢖͏ίϯϙʔωϯτ ΫϦοΫ࣌ʹݺ͹ΕΔؔ਺ Component used as input Component used as output ίʔυʢΠϕϯτͷఆٛʣ $PEF &WFOUIBOEMJOH 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  17. with gr.Blocks() as demo: ... ུ omitted ... with gr.Tab("Ի੠ࢦࣔ"):

    # ෦඼ͷఆٛ / Define components ... ུ omitted ... # ΫϦοΫΠϕϯτͷઃఆ # Define click event handlers audio_send_button.click(handler_audio_input, inputs=audio_input, outputs=[audio_transcript_output, audio_commands_output, audio_status_output, progress_lamps,],) audio_reset_button.click( ... ུ omitted ...) audio_input.start_recording(... ུ omitted ...) audio_input.stop_recording( ... ུ omitted ...) # αʔόΛىಈ / Launch the server demo.launch(server_name="0.0.0.0", share=True) ίʔυʢαʔόͷىಈʣ $PEF -BVODIUIFTFSWFS 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  18. w จࣈى͜͠΍ੜ੒"*ͷॲཧ͸͕͔͔࣌ؒΔ w ͦͷͨΊʮ͢΂ͯͷॲཧ͕ऴΘ͔ͬͯΒ݁ՌΛදࣔ͢ΔʯͷͰ͸ͳ͘ 
 ʮ్தͷεςοϓ͝ͱʹɺஈ֊తʹ݁ՌΛදࣔ͢ΔʯΑ͏ʹͨ͠ จࣈى͜͠׬ྃ ੜ੒"*ॲཧ׬ྃ 1JDPΧʔͷॲཧ׬ྃ Transcription

    completed Generative AI processing completed Pico Car processing completed Speech recognition and generative AI processing can take some time. Instead of waiting for all processes to finish before showing the result, I made it display intermediate results step by step. ଴ͪ࣌ؒΛײͤ͡͞ͳ͍޻෉ 3FEVDJOHQFSDFJWFEXBJUJOHUJNF 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  19. audio_send_button.click(handler_audio_input, inputs=audio_input, outputs=[audio_transcript_output, audio_commands_output, audio_status_output, progress_lamps]) def handler_audio_input(audio_file): success, text

    = transcribe(audio_file) if success: yield text, None, "จࣈى͜͠׬ྃ", render_lamps(["s", "u", "u"]) commands = text_to_command(text) commands = convert_commands(commands) yield text, commands, "ίϚϯυੜ੒׬ྃ", render_lamps(["s", "s", "u"]) response = send_commands(commands) status = response.get("status") yield text, commands, status, render_lamps(["s"] * 3) else: yield text, None, "จࣈى͜͠ʹࣦഊ͠·ͨ͠ɻऴྃ͠·͢", render_lamps(["e", "u", "u"]) ᶄจࣈى͜͠ͷ݁ՌΛZJFMEͰฦ͢ ᶆੜ੒"*ͷ݁ՌΛZJFMEͰฦ͢ w (SBEJPͰ͸ʮஈ֊తʹ݁ՌΛฦ͢ʯ6*΋؆୯ʹ࡞ΕΔ w ΠϕϯτϋϯυϥͷதͰɺZJFMEΛ࢖్ͬͯத݁ՌΛฦ͢ ᶈ͢΂ͯͷ݁ՌΛZJFMEͰฦ͢ ΫϦοΫ࣌ʹݺ͹ΕΔؔ਺ With Gradio, it’s easy to create a UI that returns results gradually. In the event handler, use yield to return intermediate results. Function called when the button is clicked ग़ྗͱͯ͠࢖͏ίϯϙʔωϯτ Component used as output 1. Run transription 2. Return the transcription result with yield 3. Convert the text into structured actions using generative AI 4. Return the AI-generated result with yield 5. Send structured actions to the Pico Car 6. Return all results with yield ᶃจࣈى͜͠Λ࣮ߦ ᶅੜ੒"*Ͱࢦࣔσʔλʹม׵ ᶇ1JDPΧʔʹࢦࣔσʔλૹ৴ ZJMFEΛ࢖ͬͯஈ֊తʹ݁ՌΛฦ͢ %JTQMBZJOH*OUFSNFEJBUF3FTVMUTXJUIZJFME 4UFQ8FCΞϓϦ͔ΒԻ੠ೖྗ 4UFQ*OQVUWPJDFGSPNBXFCBQQ
  20. 1JDP$BS4FSWFS ʢ3BTQCFSSZ1J1*DP$BSʣ εϚϗͳͲͷ 8FCϒϥ΢β 8FCΞϓϦ͔Β Ի੠Λೖྗ ίϚϯυΛड৴  ࣮ߦ Ի੠Λจࣈى͜͠

     ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ 4UFQ 4UFQ Input voice from a web app Web browser on a smartphone Transcribe voice and convert to structured actions using generative AI Receive and execute commands ߏ଄Խ͞Εͨ ࢦࣔσʔλ Ի੠σʔλ audio data 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSUUPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"* 1JDP$POUSPMMFS4FSWFS ʢ.BDʣ
  21. w Ի੠ͷ··ੜ੒"*ʹೖྗ͢Δͱ"1*ྉۚͷίετ͕͔͔Γͦ͏ 👉ϩʔΧϧͰGBTUFSXIJTQFSΛ࢖ͬͯจࣈى͜͠Λ͢Δ from faster_whisper import WhisperModel model_size = "tiny"

    model = WhisperModel(model_size, device="cpu", compute_type="float32") …ུ… segments, _ = model.transcribe(tmp_audio_path, language="ja") text = "".join([segment.text for segment in segments]) ᶃϞσϧͷಡΈࠐΈ ᶄϞσϧͷ࣮ߦʢจࣈى͜͠ʣ 4FOEJOHSBXBVEJPEJSFDUMZUPBHFOFSBUJWF"*DBOCFDPTUMZEVFUP"1*VTBHFGFFT 1FSGPSNUSBOTDSJQUJPOMPDBMMZVTJOHGBTUFSXIJTQFS -PBEUIFNPEFM 3VOUSBOTDSJQUJPO Ի੠ˠςΩετʢจࣈى͜͠ʣ "VEJPˠ5FYU 5SBOTDSJQUJPO 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"*
  22. w -BOH$IBJO w ೥ʹൃද͞Εͨɺ--.ΞϓϦ։ൃͷͨΊͷ1ZUIPOϑϨʔϜϫʔΫ w -BOH$IBJOܦ༝Ͱ0QFO"*ͷ"1*ʢHQUʣΛݺͼग़͠ɺ 
 ࣗવݴޠͷࢦࣔΛɺߏ଄Խ͞ΕͨࢦࣔσʔλʢࣙॻͷϦετʣʹม׵͢Δ ·͙ͬ͢ߦͬͯ ӈʹۂ͕ͬͯʂ

    [{ "action": { "type": "forward", "distance_cm": 20,}}, { "action": { "type": "turn", "direction": "right", "angle": 90,}}, ...] ࣗવݴޠ ߏ଄Խ͞ΕͨࢦࣔσʔλʢࣙॻͷϦετʣ લʹDNਐΉ ӈʹ౓ճస ੜ੒"* Generative AI Natural Language Go straight and turn right! Structured actions (list of dictionaries) Move forward 20 cm Turn right 90 degrees "1ZUIPOGSBNFXPSLGPSEFWFMPQJOH--.CBTFEBQQMJDBUJPOT SFMFBTFEJO 6TJOH-BOH$IBJO UIF0QFO"*"1* HQU JTDBMMFEUPDPOWFSUOBUVSBMMBOHVBHFJOTUSVDUJPOTJOUPTUSVDUVSFEBDUJPOT MJTUPGEJDUJPOBSJFT  ςΩετˠߏ଄Խ͞Εͨࢦࣔσʔλ 5FYUˠ4USVDUVSFEBDUJPOT 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"*
  23. from langchain.chat_models import init_chat_model from langchain_core.prompts import ChatPromptTemplate …ུ… prompt

    = ChatPromptTemplate( [ ("system", TEXT_TO_COMMAND_PROMPT), ("human", "ࢦࣔ: {text}"), ] ) llm = init_chat_model(model="gpt-4.1", temperature=0) chain = prompt | llm result = chain.invoke({"text": text}) instructions = result.content instructions = _add_led_rgb(instructions) # ੜ੒͞Εͨࢦࣔσʔλ಺ͷLEDͷ৭ʢCSSͷ৭໊ʣΛݩʹRGB৘ใΛ௥Ճ instructions = json.loads(instructions) ᶃϓϩϯϓτςϯϓϨʔτͷߏங ᶄϞσϧͷॳظԽ ᶅϓϩϯϓτͱϞσϧΛͭͳ͍Ͱ࣮ߦ #VJMEUIFQSPNQUUFNQMBUF *OJUJBMJ[FUIFNPEFM $POOFDUUIFQSPNQUBOENPEFM  UIFOFYFDVUF "EE3(#DPMPSJOGPCBTFEPOUIF-&%DPMPSOBNF $44DPMPSOBNF JO UIFHFOFSBUFEEBUB ίʔυ $PEF 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"*
  24. from langchain.chat_models import init_chat_model from langchain_core.prompts import ChatPromptTemplate …ུ… prompt

    = ChatPromptTemplate( [ ("system", TEXT_TO_COMMAND_PROMPT), ("human", "ࢦࣔ: {text}"), ] ) llm = init_chat_model(model=“gpt-4.1”, temperature=0) chain = prompt | llm result = chain.invoke({"text": text}) instructions = result.content instructions = _add_led_rgb(instructions) # ੜ੒͞ΕͨίϚϯυ಺ͷLEDͷ৭ʢCSSͷ৭໊ʣΛݩʹRGB৘ใΛ௥Ճ instructions = json.loads(instructions) ༩͑ΒΕͨ೔ຊޠͷࢦࣔจΛɺpicoΧʔ༻ͷJSONܗࣜͷಈ࡞ίϚϯυ ഑ྻʹม׵͍ͯͩ͘͠͞ɻ ࢦࣔͰ͸ͳ͍ݴ༿ΛݴΘΕͨ৔߹ʢʮ͓͸Α͏ʂʯʮ͹͔ʂʯͳͲʣ͸ɺ ݴΘΕͨݴ༿ʹର͢Δײ৘Λද͢ಈ࡞ͷίϚϯυ഑ྻʹม׵ͯͩ͘͠͞ ͍ɻ ίϚϯυ഑ྻ͚ͩΛग़ྗ͠ɺͦΕҎ֎ͷઆ໌จͳͲ͸ؚΊͳ͍Ͱͩ͘͞ ͍ɻ ֤ίϚϯυ͸ҎԼͷܗࣜΛ࢖༻͍ͯͩ͘͠͞: …ུ… Please convert the given Japanese instruction text into a JSON array of motion commands for the Pico Car. If the input text is not an instruction (for example, “Good morning!” or “You fool!”), convert it into a command array that expresses the car’s emotion in response. Output only the command array — do not include any explanations or additional text. Each command should follow the format below: ... (omitted) ... ϓϩϯϓτ 1SPNQUT 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"*
  25. ͋ͳͨՄѪ͍ΘͶʙ w ᐆດͳݴ༿ΛݴΘΕͨͱ͖͸ɺͦͷݴ༿ʹର͢Δײ৘Λද͢ಈ࡞ɾ৭ʹม׵ ͢ΔΑ͏ϓϩϯϓτͰࢦࣔ 👉ᐆດͳݴ༿ΛݴΘΕͨ࣌΋ɺԿΒ͔ͷʮੜ͖෺Β͍͠൓ԠʯΛͯ͘͠ΕΔ ͹͔͹͔ʂ ஥௚Γ͠Α͏ 8IFOBDBTVBMPSFNPUJPOBMQISBTFJTJOQVU  UIFQSPNQUJOTUSVDUTUIFTZTUFNUPFYQSFTTBNBUDIJOHFNPUJPO

    UISPVHINPWFNFOUBOE-&%DPMPS &WFOXIFOJU`TUPMETPNFUIJOHWBHVF JUTUJMMHJWFTBMJGFMJLFSFBDUJPO :PVSFTPDVUF :PVGPPM -FUTNBLFVQ .PWFGPSXBSEXJUIQJOL-&%T  TIBLJOHTJEFUPTJEF KPZ ϐϯΫͰલਐͯ͠ࠨӈʹ;Γ;Γʢتͼʁʣ ੨৭Ͱޙୀʢ൵͠Έʁʣ .PWFCBDLXBSEXJUICMVF-&%T TBEOFTT ྘Ͱલਐʢߠఆʁʣ .PWFGPSXBSEXJUIHSFFO-&%T QPTJUJWJUZ ϓϩϯϓτͷ޻෉ ,FZQPJOUTGPSQSPNQUT 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"*
  26. [ { "action": {"type": "forward", "distance_cm": 20}, "led": {"color": "green",

    "rgb": (0, 255, 0)}}, { "action": {"type": "turn", "direction": "right", "angle": 90}, "led": {"color": "green", "rgb": (0, 255, 0)}}, …] BDUJPOӈճసʢ౓ʣ MFE྘৭ BDUJPOલਐʢDNʣ MFE྘৭ TUDPNNBOE BDUJPONPWFGPSXBSE DN  MFEHSFFO OEDPNNBOE BDUJPOUVSOSJHIU ›  MFEHSFFO ੜ੒͞ΕΔࢦࣔσʔλ &YBNQMFPGHFOFSBUFEBDUJPOEBUB 4UFQԻ੠Λจࣈى͜͠ ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ5SBOTDSJCFWPJDFBOEDPOWFSU UPTUSVDUVSFEBDUJPOTVTJOHHFOFSBUJWF"* ͭ໨ͷίϚϯυ ͭ໨ͷίϚϯυ
  27. 1JDP$POUSPMMFS4FSWFS ʢ.BDʣ εϚϗͳͲͷ 8FCϒϥ΢β 8FCΞϓϦ͔Β Ի੠Λೖྗ ίϚϯυΛड৴  ࣮ߦ Ի੠Λจࣈى͜͠

     ੜ੒"*Ͱࢦࣔσʔλʹม׵ 4UFQ 4UFQ 4UFQ Input voice from a web app Web browser on a smartphone Transcribe voice and convert to structured actions using generative AI Receive and execute commands ߏ଄Խ͞Εͨ ࢦࣔσʔλ structured actions Ի੠σʔλ audio data 4UFQίϚϯυΛड৴ ࣮ߦ 4UFQ3FDFJWFBOEFYFDVUFDPNNBOET 1JDP$BS4FSWFS ʢ3BTQCFSSZ1J1*DP$BSʣ
  28. w 3BTQCFSSZ1J1JDPͰ͸ඪ४ͷ$1ZUIPO͸ಈ࡞͠ͳ͍ͷͰɺ.JDSP1ZUIPOΛ࢖͏ w .JDSPEPUΛ࢖ͬͯ8FC"1*αʔόΛ࡞੒ w .JDSPEPU͸ɺ'MBTLʹண૝ΛಘͨϛχϚϧͳ8FCϑϨʔϜϫʔΫ w $1ZUIPOͱ.JDSP1ZUIPOΛαϙʔτ͍ͯ͠ΔͷͰɺ1JDP্Ͱ΋ಈ͔ͤΔ w ࠓճͷ༻్ͷ৔߹ɺNJDSPEPUQZʢ໿,#ʣΛ1JDP্ʹஔ͚ͩ͘Ͱ࢖͑Δ

    Raspberry Pi Pico does not run standard CPython — it supports MicroPython instead. A web API server is implemented using Microdot. Microdot is a minimal web framework inspired by Flask Supports both standard CPython and MicroPython, so it can also run directly on the Pico For this project, simply placing microdot.py (about 56 KB) on the Pico is enough to use it 1JDP্Ͱ8FC"1*αʔόΛಈ͔͢ 3VOOJOHB8FC"1*4FSWFSPO1JDP 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  29. from microdot import Microdot from pico_car import Car, RealCar car

    = Car() if settings.USE_MOCK else RealCar() app = Microdot() …ུ… @app.route("/command", methods=["POST"]) async def handle_command(request): try: car.sound_freq(1000, 0.1) car.sound_freq(1200, 0.1) command_list = request.json car.run_commands(command_list) return {"status": "ok"} except Exception as e: handle_buggy_error() print_exception(e) return {"status": "error", "message": str(e)}, 500 …ུ… app.run(port=5001) ίϚϯυΛड৴࣮ͯ͠ߦ͢Δ"1*ͷఆٛ ᶃड৴ԻΛ໐Β͢ 'MBTLͬΆ͍ॻ͖ํ͕Ͱ͖Δ ࣮ػͷ1*DPΧʔͰಈ͔͍ͯ͠Δ࣌͸ɺݱࡏԿͷॲཧΛ΍͍ͬͯΔͷ͔ ϩάͳͲͰ֬ೝ͠ʹ͍͘ͷͰɺԻͰ௨஌ ઃఆʹΑͬͯϞοΫ࣮ػΛ੾Γସ͑ ʢಈ࡞֬ೝΛεϜʔζʹʣ ࣗ࡞ͷ1JDPΧʔΫϥεͷΠϯελϯεΛੜ੒ .JDSPEPUΞϓϦΛ࡞੒ ᶄίϚϯυΛ࣮ߦ Τϥʔൃੜ࣌ͷॲཧʢΤϥʔԻΛ໐Β͢ɺϞʔλఀࢭͳͲʣ αʔόΛىಈ Create an instance of the custom Pico Car class Switch between mock and real hardware for easier testing Createa a Microdot app Define an API to receive and execute commands 1. Play a notification sound Helps identify execution steps with sound since logs are not visible on the real Pico 2. Execute commands Handle errors (e.g., play error sound, stop motors) Launch the server Flask-like syntax ίʔυ $PEF 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  30. from microdot import Microdot from pico_car import Car, RealCar car

    = Car() if settings.USE_MOCK else RealCar() app = Microdot() …ུ… @app.route("/command", methods=["POST"]) async def handle_command(request): try: car.sound_freq(1000, 0.1) car.sound_freq(1200, 0.1) command_list = request.json car.run_commands(command_list) return {"status": "ok"} except Exception as e: handle_buggy_error() print_exception(e) return {"status": "error", "message": str(e)}, 500 …ུ… app.run(port=5001) ίϚϯυΛड৴࣮ͯ͠ߦ͢Δ"1*ͷఆٛ ᶃड৴ԻΛ໐Β͢ 'MBTLͬΆ͍ॻ͖ํ͕Ͱ͖Δ ࣮ػͷ1*DPΧʔͰಈ͔͍ͯ͠Δ࣌͸ɺݱࡏԿͷॲཧΛ΍͍ͬͯΔͷ͔ ϩάͳͲͰ֬ೝ͠ʹ͍͘ͷͰɺԻͰ௨஌ ઃఆʹΑͬͯϞοΫ࣮ػΛ੾Γସ͑ ʢಈ࡞֬ೝΛεϜʔζʹʣ ࣗ࡞ͷ1JDPΧʔΫϥεͷΠϯελϯεΛੜ੒ .JDSPEPUΞϓϦΛ࡞੒ ᶄίϚϯυΛ࣮ߦ Τϥʔൃੜ࣌ͷॲཧʢΤϥʔԻΛ໐Β͢ɺϞʔλఀࢭͳͲʣ αʔόΛىಈ Create an instance of the custom Pico Car class Switch between mock and real hardware for easier testing Createa a Microdot app Define an API to receive and execute commands 1. Play a notification sound Helps identify execution steps with sound since logs are not visible on the real Pico 2. Execute commands Handle errors (e.g., play error sound, stop motors) Launch the server Flask-like syntax ίʔυ $PEF 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  31. from microdot import Microdot from pico_car import Car, RealCar car

    = Car() if settings.USE_MOCK else RealCar() app = Microdot() …ུ… @app.route("/command", methods=["POST"]) async def handle_command(request): try: car.sound_freq(1000, 0.1) car.sound_freq(1200, 0.1) command_list = request.json car.run_commands(command_list) return {"status": "ok"} except Exception as e: handle_buggy_error() print_exception(e) return {"status": "error", "message": str(e)}, 500 …ུ… app.run(port=5001) ίϚϯυΛड৴࣮ͯ͠ߦ͢Δ"1*ͷఆٛ ᶃड৴ԻΛ໐Β͢ 'MBTLͬΆ͍ॻ͖ํ͕Ͱ͖Δ ࣮ػͷ1*DPΧʔͰಈ͔͍ͯ͠Δ࣌͸ɺݱࡏԿͷॲཧΛ΍͍ͬͯΔͷ͔ ϩάͳͲͰ֬ೝ͠ʹ͍͘ͷͰɺԻͰ௨஌ ઃఆʹΑͬͯϞοΫ࣮ػΛ੾Γସ͑ ʢಈ࡞֬ೝΛεϜʔζʹʣ ࣗ࡞ͷ1JDPΧʔΫϥεͷΠϯελϯεΛੜ੒ .JDSPEPUΞϓϦΛ࡞੒ ᶄίϚϯυΛ࣮ߦ Τϥʔൃੜ࣌ͷॲཧʢΤϥʔԻΛ໐Β͢ɺϞʔλఀࢭͳͲʣ αʔόΛىಈ Create an instance of the custom Pico Car class Switch between mock and real hardware for easier testing Createa a Microdot app Define an API to receive and execute commands 1. Play a notification sound Helps identify execution steps with sound since logs are not visible on the real Pico 2. Execute commands Handle errors (e.g., play error sound, stop motors) Launch the server Flask-like syntax ίʔυ $PEF 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  32. from microdot import Microdot from pico_car import Car, RealCar car

    = Car() if settings.USE_MOCK else RealCar() app = Microdot() …ུ… @app.route("/command", methods=["POST"]) async def handle_command(request): try: car.sound_freq(1000, 0.1) car.sound_freq(1200, 0.1) command_list = request.json car.run_commands(command_list) return {"status": "ok"} except Exception as e: handle_buggy_error() print_exception(e) return {"status": "error", "message": str(e)}, 500 …ུ… app.run(port=5001) ίϚϯυΛड৴࣮ͯ͠ߦ͢Δ"1*ͷఆٛ ᶃड৴ԻΛ໐Β͢ 'MBTLͬΆ͍ॻ͖ํ͕Ͱ͖Δ ࣮ػͷ1*DPΧʔͰಈ͔͍ͯ͠Δ࣌͸ɺݱࡏԿͷॲཧΛ΍͍ͬͯΔͷ͔ ϩάͳͲͰ֬ೝ͠ʹ͍͘ͷͰɺԻͰ௨஌ ઃఆʹΑͬͯϞοΫ࣮ػΛ੾Γସ͑ ʢಈ࡞֬ೝΛεϜʔζʹʣ ࣗ࡞ͷ1JDPΧʔΫϥεͷΠϯελϯεΛੜ੒ .JDSPEPUΞϓϦΛ࡞੒ ᶄίϚϯυΛ࣮ߦ Τϥʔൃੜ࣌ͷॲཧʢΤϥʔԻΛ໐Β͢ɺϞʔλఀࢭͳͲʣ αʔόΛىಈ Create an instance of the custom Pico Car class Switch between mock and real hardware for easier testing Createa a Microdot app Define an API to receive and execute commands 1. Play a notification sound Helps identify execution steps with sound since logs are not visible on the real Pico 2. Execute commands Handle errors (e.g., play error sound, stop motors) Launch the server Flask-like syntax ίʔυ $PEF 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  33. w ୯७ʹGPSจͰίϚϯυΛॱʹ࣮ߦ w ίϚϯυʹԠͯ͡Ϟʔλ΍-&%ͷ੍ޚ w Ϟʔλʔ΍-&%ͷ੍ޚʹ͸ɺ1JDPΧʔͷΩοτ༻ͷϥΠϒϥϦΛ࢖༻ w IUUQTHJUIVCDPN,JUSPOJL-UE,JUSPOJL1JDP"VUPOPNPVT3PCPUJDT1MBUGPSN.JDSP1ZUIPO Simply execute

    each command sequentially in a for loop Control the motors and LEDs according to each command The motors and LEDs are controlled using the library for the Pico Car kit*¹ ίϚϯυΛॱ൪ʹॲཧ 1SPDFTTJOH$PNNBOETJO0SEFS 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  34. 4UFQඵؒಈ͔ͨ͠ͱ͖ͷڑ཭֯౓Λ࣮ଌ 4UFQݱࡏͷઃఆΛऔಘ 4UFQͷ࣮ଌ஋Λݩʹઃఆ w ૸ΒͤΔ৔ॴͷঢ়ଶʢϑϩʔϦϯάΧʔϖοτͳͲʣʹΑͬͯɺ 
 ಉ͡ίϚϯυͰ΋࣮ࡍʹਐΉڑ཭΍֯౓͕มΘͬͯ͠·͏ w 👉(SBEJPͰઃఆը໘Λ࡞ΓɺʮϞʔλͷ଎͞ʯͳͲΛௐ੔Մೳʹ Depending

    on the surface conditions (e.g., flooring, carpet), the actual travel distance and turning angle can vary even with the same command. Created a settings panel in Gradio to adjust parameters such as motor speed. Retrieve current settings Measure distance/angle after moving for 1 second Apply the measured values to update settings Ϟʔλͷ଎౓ͷௐ੔ "EKVTUJOH.PUPS4QFFE 4UFQίϚϯυΛड৴ ࣮ߦ 3FDFJWFBOEFYFDVUFDPNNBOET
  35. w ͭͭͷٕज़͸ඇৗʹ؆୯͕ͩɺ૊Έ߹ΘͤΔͱ໘ന͍΋ͷ͕Ͱ͖Δ w ੜ੒"*Y1JDP͸ɺ·ͩ·ͩԠ༻͢Δͱ໘ന͍΋ͷ͕Ͱ͖ͦ͏ w 1JDPΧʔͷΞΫγϣϯΛ૿΍͢ w ྫ-&%ͷޫΓํͷύλʔϯΛ૿΍͢ɺαʔϘϞʔλͰ৲ඌΛ༳Β͢ w ϓϩϯϓτͰ1JDPΧʔͷੑ֨ΛΧελϚΠζ

    w ྫʮ͹͔͹͔ʂʯͱݴΘΕͨ࣌ʹ൵͠ΉλΠϓ͔ɺౖΔλΠϓ͔ʢ-&%͕ ੨PS੺ʣ Each individual technology is very simple, but combining them creates something interesting. Generative AI × Pico still has a lot of unexplored potential. Add more actions to the Pico Car. e.g., create more LED lighting patterns or make it wag its "tail" with a servo motor. Customize the Pico Car’s personality through the prompt. e.g., when told “You fool!”, decide whether it reacts sadly or angrily (LED turns blue or red). ໘ന͍ͱ͜Ζ *OUFSFTUJOHQPJOUT ݁Ռ 3FTVMU
  36. w (SBEJPͰ࡞ͬͨ6*͔Βɺϧʔτը૾Λೖྗʢ4ελʔτ஍఺ɺ(ΰʔϧ஍఺ʣ w ϓϩϯϓτͱҰॹʹը૾Λੜ੒"*ʹ౉͢ Ωϟϯόε্ʹϧʔτΛඳ͍ͯೖྗ ʢHS*NBHF&EJUPSʣ ༩͑ΒΕͨը૾ʹϧʔτ͕ඳ͔Ε͍ͯ·͢ɻ S͕ελʔτ஍఺ͰɺG͕ΰʔϧ஍఺Ͱ͢ɻ ͜ͷϧʔτը૾ΛɺpicoΧʔ༻ͷJSONܗࣜͷಈ࡞ίϚϯυ഑ྻʹ 


    ม׵͍ͯͩ͘͠͞ɻ …ུ… Input a route image from the UI built with Gradio (S = Start point, G = Goal point). Send the image to the generative AI together with a prompt. Draw the route on the canvas and input it The provided image contains a drawn route. “S” marks the starting point, and “G” marks the goal. Convert this route image into a JSON-formatted sequence of motion commands for the Pico Car. ...Omitted... ϓϩϯϓτ Prompts ϧʔτը૾Λ༩͑ͯ1JDPΧʔΛಈ͔͢ $POUSPMMJOHUIF1JDP$BSXJUIB3PVUF*NBHF Ԡ༻ "QQMJDBUJPO
  37. w ੜ੒"*Λར༻ͯ͠ɺԻ੠΍ը૾Ͱಈ͓͘΋ͪΌͷंͷ࡞ΓํΛ঺հ w --.Ͱɺࠓ·Ͱ͸೉͔ͬͨ͠ॲཧΛ؆୯ʹ࣮ݱͰ͖ΔΑ͏ʹͳͬͨ w (SBEJPͩͱɺԻ੠΍ը૾ೖྗΛ؆୯ʹѻ͑Δ w -BOH$IBJOͰɺ؆୯ʹੜ੒"*ΛΞϓϦʹ૊ΈࠐΊΔ w ੜ੒"*Y1JDP͸ɺ·ͩ·ͩԠ༻͢Δͱ໘ന͍΋ͷ͕Ͱ͖ͦ͏

    Introduced how to create a toy car that moves using voice and images with generative AI. With LLM, tasks that used to be difficult can now be implemented easily. With Gradio, handling voice and image input becomes simple. With LangChain, it’s easy to integrate generative AI into applications. The combination of Generative AI × Pico still has great potential for more creative applications. ·ͱΊ 4VNNBSZ