LLMによる原著同定

LLMによる原著同定 by Toshi (@ginyu_pro) LLM福岡 ChatGPT LT会 by ＠ginyu_pro 1

tl;dr 翻訳書の原著の自動検出器を作りましたついでにその日発売された翻訳書についての記事を自動生成動機発売したての翻訳書にはまだレビューがない => 原著のレビュー・受賞／ベストセラー情報等を知りたい書籍×AI/NLPで何かしたい LLM福岡 ChatGPT
LT会 by ＠ginyu_pro 2

課題邦訳 <=> 原著を紐づける公開データは存在しない（特に発売当初は）各種DB/APIも手動任意登録で、あったりなかったり世界最大書評サイトGoodReadsのeditionsにも邦訳はあまり載っていないそんなん調べりゃすぐわかるでしょ？著者名の原表記すら不明 => 表記揺れ／名寄せ問題
タイトル全然違う問題 LLM福岡 ChatGPT LT会 by ＠ginyu_pro 3

実例１今回自動生成した記事から抜粋全然タイトル違う！（かろうじてwords）著者名カナだけ！ ※ 表紙には原題・原著者表記が書いある場合は多いのでOCRでいく作戦はあり LLM福岡 ChatGPT LT会
by ＠ginyu_pro 4

実例２ Amazonサイトこの作品の場合原著の書誌情報が皆無著者名はカナのみ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 5

実例３ google検索結果（運よく！）原著者名があることも！ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 6

実例４ GoodReadsで原著者名で検索した作品リストタイトル全然違う（が作品ページから英文要約はとれる！） LLM福岡 ChatGPT LT会 by ＠ginyu_pro
7

全体処理フロー 1. 国会図書館APIから指定日に発売の書籍一覧を取得 =>「翻訳者」が設定されているものに絞る 2. 原著特定フロー（後述） 3. 原著の要約＆レビューから情報抽出 =>
yaml出力 pros & cons keywords related books レビューが多い場合は分割実行したyamlを最後にLLMでmerge 4. yamlからmarkdownを生成 => ブログに自動投稿 LLM福岡 ChatGPT LT会 by ＠ginyu_pro 8

原著特定フロー 1. 著者候補を決定（無理に決定せず複数候補に留める）直接推定（LLMにきいてしまう）要約・書誌情報・ネット検索結果から原表記を探す（あればラッキー） 2. 書誌API等で著者候補で検索し書籍一覧を取得候補者各々について行う 3. 原著を特定
タイトル類似度で絞る要約の類似度で絞る（和訳に際してタイトルが全く変わる場合あり） LLM福岡 ChatGPT LT会 by ＠ginyu_pro 9

NameFinder1 カナ著者名の英字原表記を直接LLMにきいてしまう。一般的な名前ならかなり上手くいく。 Question: translate person name "{ja_name}" to original
language. Please list up all possible answers, variations, with low confidence. The answer name should be standardized as format like "<first name> <family name>" for western name. desired answer format: string array in JSON format. example: ["Thomas Edison", "Albert Ainstein"] Answer: LLM福岡 ChatGPT LT会 by ＠ginyu_pro 10

NameFinder2 sourceに、google検索結果文を入れる。運が良ければそこのどこかにオリジナルの著者名が書いてある。 Question: translate author name "{ja_author}" to original
language. Please list up all possible answers, based on given source text. desired answer format: JSON like {{"names": ["name1","name2"], "contains": true}} - names(list[str]): estimated author names list - express each name in standard format: "<first name> <family name>" - contains(bool): whether if you estimated upon given text or not source text: {source} Answer: LLM福岡 ChatGPT LT会 by ＠ginyu_pro 11

AuthorFinder book infoに、邦訳の要約や書誌情報を入れる運が良ければそこのどこかにオリジナルの著者名が書いてある。 I show you a book information
written in Japanese. Please answer author names with text array JSON format. Answer example: ["Thomas Edison"] {book_info} LLM福岡 ChatGPT LT会 by ＠ginyu_pro 12

NameCorrector 氏名の標準化・表記揺れ解消。書誌情報APIにクエリするときの前処理に使う。 please rewrite given names in the manner of:
<first name> <family name> good: "John Smith" bad: "Smith, John" desired output format: JSON string list (ex. ["Nancy McDonald", "Thomas Edison"]) names: {names} LLM福岡 ChatGPT LT会 by ＠ginyu_pro 13

BookIdentifierByTitle 本のタイトル類似度から原著を推定する。 I show you a Japanese-translated book title and
summary, and some book title list, which must contain original book title. Please list up possible, probable answers even with low confidence. Answer count should be {min_answer} at least. Answer format should be only its indexes (ex. [0], [1, 7]). Please mind that Index starts with 0. If it's difficult to choose, return all indexes. Japanese title: "{{ja_title}}" Japanese summary: """ {{ja_summary}} """ title list: {{titles}} # desired answer format: number list in JSON format(example: "[1,3]", "[0]") LLM福岡 ChatGPT LT会 by ＠ginyu_pro 14

BookIdentifierByTitleAndSummary 本のタイトル＆要約の類似度から原著を推定する。 I show you a book description in Japanese,
and {n_samples} English descriptions. One of them correspond to Japanese one. Please estimate it. Answer just sample index(starting from 0), without any sentence. # desired answer format: integer # Japanese description: {ja_description} # English description samples: {en_descriptions} LLM福岡 ChatGPT LT会 by ＠ginyu_pro 15

Reporter レビュー文から構造的に情報を抽出する。 please do 3 tasks based on the text
below. Answer format should be YAML. text: \"\"\" {reviews} \"\"\" tasks: - task1: describe pros and cons. - task2: list up technical terms or jargons with short description and difficulty level(1-5) - task3: list up related book titles referred in reviews desired answer format: \"\"\" pros: - "XXX" - "YYY" - ... LLM福岡 ChatGPT LT会 by ＠ginyu_pro 16

記事生成結果１ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 17

記事生成結果２ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 18

記事生成結果３ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 19

おまけの興味世界翻訳ネットワークの分析東欧での日本文学人気とか各国のタイトルや表紙の違い LLM福岡 ChatGPT LT会 by ＠ginyu_pro 20

本件のデモ用ブログ新刊邦訳紹介： https://hitokun.hatenablog.com/ LLM福岡 ChatGPT LT会 by ＠ginyu_pro 21

LLM福岡 ChatGPT LT会 by ＠ginyu_pro 22

LLMによる原著同定

LLMによる原著同定

toshi

More Decks by toshi

Other Decks in Technology

Featured

Transcript