Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Searchnos & Search on Nostr
Search
Yoji Shidara
November 02, 2023
0
180
Searchnos & Search on Nostr
Nostrasia day 3
2023-11-03
Yoji Shidara
November 02, 2023
Tweet
Share
More Decks by Yoji Shidara
See All by Yoji Shidara
nostrbuzzsのしくみ
dara
0
420
searchnos について
dara
0
290
How nostrbuzzs works
dara
0
58
About searchnos
dara
0
66
HOME, GOPATH and me
dara
3
1.6k
The First Step for Building Groonga Bindings with Golang
dara
6
1.3k
まほうのひととき - The Magic Hour
dara
7
830
JDK CHRONICLE
dara
2
8.7k
Timelapse Introduction
dara
1
600
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Designing for Performance
lara
604
68k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Testing 201, or: Great Expectations
jmmastey
41
7.2k
RailsConf 2023
tenderlove
29
970
Producing Creativity
orderedlist
PRO
343
39k
4 Signs Your Business is Dying
shpigford
182
22k
Git: the NoSQL Database
bkeepers
PRO
427
64k
Bash Introduction
62gerente
610
210k
Optimising Largest Contentful Paint
csswizardry
33
3k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
28
2.2k
Transcript
Searchnos & Search on Nostr @darashi npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c 2023-11-03 Nostrasia
day 3
Let me introduce myself,
with my works...
Articles:
I wrote "NIP-01 を読む (Reading NIP-01)" and full Japanese translation
of NIP-01
for ...
None
"Hello, Nostr!" Fanzine https://nip-book.nostr-jp.org/
And the second issue is in press now,
None
"Hello, Nostr! Yo Bluesky!" Fanzine will be available on 12th
November 2023, at Techbookfest 15, Ikebukuro https://nip-book.nostr-jp.org/
I wrote a short summary of NIP-01 updates from the
first issue, and
the Japanese translation of the latest version of NIP-01.
I also wrote " 作ってわかる Nostr プロトコル (Understanding the Nostr
Protocol by Writing Code)" for the series for Nostr,
None
in Software Design, November 2023 issue
My article is 5th of the series,
written by members of Japanese Nostr community.
None
You can buy Software Design magazine at bookstores in Japan.
Good for souvenir :)
Softwares:
murasaki: Nostr to Speech a client reads notes with Text-to-Speech.
Mapnos: Map Notes and Other Stuff shows geotagged kind 1
notes on a map. https://mapnos.vercel.app/
None
nostrbuzzs: buzzphrase detector for Nostr detects trending phrases in real
time https://nostrbuzzs.deno.dev/
None
This is a kind of "algo" you might not like.
The point is that anyone can implement their own algo
with Nostr.
I just think it's fun to see what's going on
in Nostr,
especially at least in this early stage of Nostr.
nos.today: Web client for NIP-50 search
Searchnos: NIP-50 relay
Today, I'm going to talk about "Searchnos".
Searchnos is a NIP-50 relay, having Elasticsearch as its backend.
It's an OSS and available on GitHub. https://github.com/darashi/searchnos
Motivation:
I want to search on Nostr in Japanese. (for nostrbuzzs)
As far as I know, at the time I started
developing Searchnos,
relay.nostr.band was the only public relay that supported the NIP-50.
relay.nostr.band works very well in many cases,
but I noticed sometime unexpected results are returned when querying
in Japanese.
I'm not sure but maybe due to tokenization.
Typical full text search approach is to tokenize the text
into words:
"One beer, please." -> ["one", "beer", "please"]
Query "please bear" (AND search) should matches with the text
"One beer, please."
In Japanese, words are not separated by spaces:
ビールを一杯ください。 bii-ru o ippai kudasai (One beer, please.)
Using a technique called morphological analysis, it is possible to
break them into words.
ビールを一杯ください。 -> [" ビール", " を", " 一杯", " ください"]
This analysis depends on dictionaries,
and it's not always correct.
Especially weak to new words.
Another approach is to use N-gram indexing.
ビールを一杯ください。 -> [" ビー", " ール", " ルを", " を一",
" 一杯", " 杯く", " く だ", " さい", " い。"] (bi-gram)
If we query " ルを ビー" (this doesn't make sense),
it will be tokenized as [" ル を", " ビー"],
If we treat these tokens in the same way as
English words,
it can result in false positives, because
[" ビー", " ール", " ルを", " を 一", "
一杯", " 杯く", " くだ", " さい", " い。"] ⊇ [" ルを", " ビー"]
We need to use N-gram indexing and consider the position
of the tokens.
Today I won't go into details more ...
Any way, some effort is needed.
In order to tackle Japanese language specific problems,
it seemed like a good idea to have my own
relay implementation.
So I made Searchnos.
Architecture:
None
In order simplify the implimenation,
Searchnos continuously polls Elasticsearch after EOSE.
Sequence diagram:
Source Relay Indexer Elasticsearch Searchnos Relay Source Relay Indexer Elasticsearch
Searchnos Relay loop loop Client REQ 1 query 2 response 3 EVENT (if matched) 4 EVENT (if matched) 5 EOSE 6 EVENT 7 EVENT 8 index request 9 wait 10 query 11 response 12 EVENT (if matched) 13 CLOSE 14 Client
I'm running Searchnos at wss://search.nos.today
Some details & future works:
(1) Stop polling Elasticsearch for events after EOSE
Doing this can reduce the latency of search results and
load on Elasticsearch.
We need to implement "filter evaluator",
and it'll be duplicated with the procedure using Elasticseach.
(2) Different index lifetime by kind
Currently Searchnos indexes all events on a daily basis
and keeps 30 sub indices (in 30 days; configurable).
Elasticsearch can search multiple indices transparently,
and can also delete an index quite efficiently.
In this way, Searchnos put TTLs on events.
This design was choosen because I mainly targetted to real
time search at first.
But it would be better to have longer TTLs for
some kinds,
for example kind 0 and 30023 (NIP-23 long form contents).
So I want to make it possible to configure TTLs
according to kinds.
(3) Support policy plugins (like strfry)
Searchnos uses indexer to filter events to be indexed.
Indexer sends events to Serchnos relay using NIP- 01,
to the special administrative endpoint
for spam prevention
If we have a policy plugin system, Searchnos can recieve
events from users directly.
(4) Support more sophisticated queries.
Currently Searchnos treat queries as AND search separated by spaces.
(4-a) Logical operations: AND, OR, NOT, ...
In order to achive this, search query parser needs to
be implemented.
(4-b) Language specific search
User may want to search like "nostrasia lang:ja".
Searchnos internally detects languages using Elasticsearch's language detector.
But how to treat "lang:ja"?
Where to parse the query?
Relay? or Client?
I'm not sure how should do this.
(4-c) Search events from a specific user
User may want to query "nostrasia from:darashi"
This is more difficult than the language specific search,
because we need to join with kind 0 with search
results.
If we know the pubkey of the user, NIP-01 can
filter the events from the user
But who should convert the query "user:darashi" into the NIP-01
filter?
Relay? or Client?
It's not realistic to expect all clients to implement this.
How about making libraries for clients?
It's not a bad idea, but we need to immplement
in many programming languages.
On the other hand, if we implement on the relay
side,
implementation differences between search relays may lead to inconsistent results.
Then how should we implement this?
Hybrid approach may be a solution.
What about converting the query to an intermediate representation at
the client,
and sending it to the relay?
Anyway,
I think the beauty of Nostr is
that each part can be implemented with a little effort.
Complicated specs easily destroy that.
Do you have any good idea?
Conclusion
I made Searchnos and it's working.
But there are still many things to do.
Building a relay is open to everyone, and it's fun.
Try it! Thank you.
(Please help me if you can translate the questions and
answers.)
npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c