Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Searchnos & Search on Nostr
Search
Yoji Shidara
November 02, 2023
0
140
Searchnos & Search on Nostr
Nostrasia day 3
2023-11-03
Yoji Shidara
November 02, 2023
Tweet
Share
More Decks by Yoji Shidara
See All by Yoji Shidara
nostrbuzzsのしくみ
dara
0
360
searchnos について
dara
0
220
How nostrbuzzs works
dara
0
41
About searchnos
dara
0
45
HOME, GOPATH and me
dara
3
1.5k
The First Step for Building Groonga Bindings with Golang
dara
6
1.2k
まほうのひととき - The Magic Hour
dara
7
780
JDK CHRONICLE
dara
2
8.7k
Timelapse Introduction
dara
1
580
Featured
See All Featured
Atom: Resistance is Futile
akmur
261
25k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
663
120k
Side Projects
sachag
451
42k
Making Projects Easy
brettharned
113
5.8k
Making the Leap to Tech Lead
cromwellryan
128
8.8k
For a Future-Friendly Web
brad_frost
174
9.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3.4k
Fashionably flexible responsive web design (full day workshop)
malarkey
401
65k
Happy Clients
brianwarren
96
6.6k
Bash Introduction
62gerente
608
210k
From Idea to $5000 a Month in 5 Months
shpigford
379
46k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
230
17k
Transcript
Searchnos & Search on Nostr @darashi npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c 2023-11-03 Nostrasia
day 3
Let me introduce myself,
with my works...
Articles:
I wrote "NIP-01 を読む (Reading NIP-01)" and full Japanese translation
of NIP-01
for ...
None
"Hello, Nostr!" Fanzine https://nip-book.nostr-jp.org/
And the second issue is in press now,
None
"Hello, Nostr! Yo Bluesky!" Fanzine will be available on 12th
November 2023, at Techbookfest 15, Ikebukuro https://nip-book.nostr-jp.org/
I wrote a short summary of NIP-01 updates from the
first issue, and
the Japanese translation of the latest version of NIP-01.
I also wrote " 作ってわかる Nostr プロトコル (Understanding the Nostr
Protocol by Writing Code)" for the series for Nostr,
None
in Software Design, November 2023 issue
My article is 5th of the series,
written by members of Japanese Nostr community.
None
You can buy Software Design magazine at bookstores in Japan.
Good for souvenir :)
Softwares:
murasaki: Nostr to Speech a client reads notes with Text-to-Speech.
Mapnos: Map Notes and Other Stuff shows geotagged kind 1
notes on a map. https://mapnos.vercel.app/
None
nostrbuzzs: buzzphrase detector for Nostr detects trending phrases in real
time https://nostrbuzzs.deno.dev/
None
This is a kind of "algo" you might not like.
The point is that anyone can implement their own algo
with Nostr.
I just think it's fun to see what's going on
in Nostr,
especially at least in this early stage of Nostr.
nos.today: Web client for NIP-50 search
Searchnos: NIP-50 relay
Today, I'm going to talk about "Searchnos".
Searchnos is a NIP-50 relay, having Elasticsearch as its backend.
It's an OSS and available on GitHub. https://github.com/darashi/searchnos
Motivation:
I want to search on Nostr in Japanese. (for nostrbuzzs)
As far as I know, at the time I started
developing Searchnos,
relay.nostr.band was the only public relay that supported the NIP-50.
relay.nostr.band works very well in many cases,
but I noticed sometime unexpected results are returned when querying
in Japanese.
I'm not sure but maybe due to tokenization.
Typical full text search approach is to tokenize the text
into words:
"One beer, please." -> ["one", "beer", "please"]
Query "please bear" (AND search) should matches with the text
"One beer, please."
In Japanese, words are not separated by spaces:
ビールを一杯ください。 bii-ru o ippai kudasai (One beer, please.)
Using a technique called morphological analysis, it is possible to
break them into words.
ビールを一杯ください。 -> [" ビール", " を", " 一杯", " ください"]
This analysis depends on dictionaries,
and it's not always correct.
Especially weak to new words.
Another approach is to use N-gram indexing.
ビールを一杯ください。 -> [" ビー", " ール", " ルを", " を一",
" 一杯", " 杯く", " く だ", " さい", " い。"] (bi-gram)
If we query " ルを ビー" (this doesn't make sense),
it will be tokenized as [" ル を", " ビー"],
If we treat these tokens in the same way as
English words,
it can result in false positives, because
[" ビー", " ール", " ルを", " を 一", "
一杯", " 杯く", " くだ", " さい", " い。"] ⊇ [" ルを", " ビー"]
We need to use N-gram indexing and consider the position
of the tokens.
Today I won't go into details more ...
Any way, some effort is needed.
In order to tackle Japanese language specific problems,
it seemed like a good idea to have my own
relay implementation.
So I made Searchnos.
Architecture:
None
In order simplify the implimenation,
Searchnos continuously polls Elasticsearch after EOSE.
Sequence diagram:
Source Relay Indexer Elasticsearch Searchnos Relay Source Relay Indexer Elasticsearch
Searchnos Relay loop loop Client REQ 1 query 2 response 3 EVENT (if matched) 4 EVENT (if matched) 5 EOSE 6 EVENT 7 EVENT 8 index request 9 wait 10 query 11 response 12 EVENT (if matched) 13 CLOSE 14 Client
I'm running Searchnos at wss://search.nos.today
Some details & future works:
(1) Stop polling Elasticsearch for events after EOSE
Doing this can reduce the latency of search results and
load on Elasticsearch.
We need to implement "filter evaluator",
and it'll be duplicated with the procedure using Elasticseach.
(2) Different index lifetime by kind
Currently Searchnos indexes all events on a daily basis
and keeps 30 sub indices (in 30 days; configurable).
Elasticsearch can search multiple indices transparently,
and can also delete an index quite efficiently.
In this way, Searchnos put TTLs on events.
This design was choosen because I mainly targetted to real
time search at first.
But it would be better to have longer TTLs for
some kinds,
for example kind 0 and 30023 (NIP-23 long form contents).
So I want to make it possible to configure TTLs
according to kinds.
(3) Support policy plugins (like strfry)
Searchnos uses indexer to filter events to be indexed.
Indexer sends events to Serchnos relay using NIP- 01,
to the special administrative endpoint
for spam prevention
If we have a policy plugin system, Searchnos can recieve
events from users directly.
(4) Support more sophisticated queries.
Currently Searchnos treat queries as AND search separated by spaces.
(4-a) Logical operations: AND, OR, NOT, ...
In order to achive this, search query parser needs to
be implemented.
(4-b) Language specific search
User may want to search like "nostrasia lang:ja".
Searchnos internally detects languages using Elasticsearch's language detector.
But how to treat "lang:ja"?
Where to parse the query?
Relay? or Client?
I'm not sure how should do this.
(4-c) Search events from a specific user
User may want to query "nostrasia from:darashi"
This is more difficult than the language specific search,
because we need to join with kind 0 with search
results.
If we know the pubkey of the user, NIP-01 can
filter the events from the user
But who should convert the query "user:darashi" into the NIP-01
filter?
Relay? or Client?
It's not realistic to expect all clients to implement this.
How about making libraries for clients?
It's not a bad idea, but we need to immplement
in many programming languages.
On the other hand, if we implement on the relay
side,
implementation differences between search relays may lead to inconsistent results.
Then how should we implement this?
Hybrid approach may be a solution.
What about converting the query to an intermediate representation at
the client,
and sending it to the relay?
Anyway,
I think the beauty of Nostr is
that each part can be implemented with a little effort.
Complicated specs easily destroy that.
Do you have any good idea?
Conclusion
I made Searchnos and it's working.
But there are still many things to do.
Building a relay is open to everyone, and it's fun.
Try it! Thank you.
(Please help me if you can translate the questions and
answers.)
npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c