Upgrade to Pro — share decks privately, control downloads, hide ads and more …

annoy4s

Pishen Tsai
July 05, 2016
74

 annoy4s

Pishen Tsai

July 05, 2016
Tweet

Transcript

  1. Pros • Fully supports all the functionality of Annoy (indexing/querying,

    Euclidean/Angular). • Didn't rewrite the code, utilized the optimized C++ code provided by Annoy. • Easy parallelized by Scala. queries.par.map(q => annoy.query(q)) • JVM with C++ native code is fast and type-safe. • Annoy itself is fast (10x faster than lsh4s).
  2. Cons • Platform dependent. Need to compile the C++ code

    (for now) if you're not using linux-x86-64. > compileNative > publish • May not be as simple as lsh4s when broadcasting the index onto each worker in Spark. • My C++ skill is poor, as well as my JN* knowledge.
  3. JNA in one page libraryDependencies += "net.java.dev.jna" % "jna" %

    "4.2.2" src/main/cpp/annoyjava.cpp src/main/scala/annoy4s/AnnoyLibrary.scala src/main/resources/linux-x86-64/libannoy.so functions mapping compile call
  4. AnnoyIndexInterface<int, float> *createEuclidean(int f) { return new AnnoyIndex<int, float, Euclidean,

    Kiss64Random>(f); } val annoy: Pointer = lib.createEuclidean(64) memory address space JVM -Xmx2G AnnoyIndex annoy
  5. void getNnsByItem(AnnoyIndexInterface<int, float> *ptr, int item, int n, int search_k,

    int *result, float *distances){ vector<int> resultV; vector<float> distancesV; ptr->get_nns_by_item(item, n, search_k, &resultV, &distancesV); std::copy(resultV.begin(), resultV.end(), result); std::copy(distancesV.begin(), distancesV.end(), distances); } val result = Array.fill(10)(-1) val distances = Array.fill(10)(-1.0f) lib.getNnsByItem(annoy, item, 10, -1, result, distances) under GC's control free automatically