Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Study in Bitmap: Is NDK the fast Processing m...

bigbackboom
July 10, 2024
220

A Study in Bitmap: Is NDK the fast Processing method by CPU?

bigbackboom

July 10, 2024
Tweet

More Decks by bigbackboom

Transcript

  1. NDKとは? The Android NDK is a toolset that lets you

    implement parts of your app in native code, using languages such as C and C++. For certain types of apps, this can help you reuse code libraries written in those languages.*2 *2 https://developer.android.com/ndk
  2. Javaの実装 • NDKは固定値で青のブレ ンディング • Javaは固定値で赤のブレ ンディング Kotlinの実装 object AlphaBlender

    { const val RED = 0 const val GREEN = 1 const val BLUE = 2 const val ALPHA = 3 fun processImageJava ( bmpByteArray: ByteArray, rgbArray: IntArray ) { val alpha = rgbArray[ ALPHA] * 0.01 val beta = 1.0f - (rgbArray[ ALPHA] * 0.01) var index = 54 while (index < bmpByteArray. size) { val a = bmpByteArray[index].toInt() and 0xff val b = bmpByteArray[index + 1].toInt() and 0xff val c = bmpByteArray[index + 2].toInt() and 0xff bmpByteArray[index] = ((alpha * a).toInt() + (beta * rgbArray[ BLUE]).toInt()).toByte() bmpByteArray[index + 1] = ((alpha * b).toInt() + (beta * rgbArray[ GREEN]).toInt()).toByte() bmpByteArray[index + 2] = ((alpha * c).toInt() + (beta * rgbArray[ RED]).toInt()).toByte() index += 3 } } // NDK implementation below // . // . // . }
  3. NDKの実装 • NDKはKotlin側に external関数を用意して おく • これに対応するC++実装 を用意する。 NDKの実装 //

    AlphaBlender.kt object AlphaBlender { // . // . // . // Java implementation above external fun processImage( bmpByteArray: ByteArray, rgbaArray: IntArray ) }
  4. NDKの実装 • パッケージ名、対応するクラス とメソッド名でC++側にメソッ ドを作成する。 NDKの実装 extern "C" JNIEXPORT void

    JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env, jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  5. NDKの実装 • パッケージ名、対応するクラス とメソッド名でC++側にメソッ ドを作成する。 • 第1パラメーターは JNIEnv *env •

    第2パラメーターは jobject *this NDKの実装 extern "C" JNIEXPORT void JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  6. NDKの実装 • 第3パラメーター以降は任意の ものを渡す。 • プリミティブな形でも jxxxxxと 言う名前の特殊型になるので注 意が必要 NDKの実装

    extern "C" JNIEXPORT void JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  7. NDKの実装 • env を利用して、配列サイズを 取得 NDKの実装 extern "C" JNIEXPORT void

    JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  8. NDKの実装 • env を利用して、配列サイズを 取得 • malloc 動的に配列を作成。 C/C++は固定値で作成できる配 列に限りがあるため、mallocを

    使う • bmp_byte_arrayをpにコピー NDKの実装 extern "C" JNIEXPORT void JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  9. NDKの実装 • 先ほどと同じ手法で、アルファ ブレンドのカラー配列を取得 NDKの実装 extern "C" JNIEXPORT void JNICALL

    Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  10. NDKの実装 • 先ほどと同じ手法で、アルファ ブレンドのカラー配列を取得 • アルファブレンドの処理をピク セル単位で行う。 NDKの実装 extern "C"

    JNIEXPORT void JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  11. NDKの実装 • 先ほどと同じ手法で、アルファ ブレンドのカラー配列を取得 • アルファブレンドの処理をピク セル単位で行う。 • bmp_byte_arrayに値をコピー。 参照なためこれで反映

    NDKの実装 extern "C" JNIEXPORT void JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  12. NDKの実装 • 確保したメモリーは自動で解放 されないので、ちゃんとお片付 け。 NDKの実装 extern "C" JNIEXPORT void

    JNICALL Java_com_bbb_imageprocessor_AlphaBlender_processImage ( JNIEnv *env,       jobject thiz, jbyteArray bmp_byte_array, jintArray rgba_array) { jsize size = env->GetArrayLength(bmp_byte_array); uint8_t *p = (uint8_t *) malloc( sizeof(uint8_t) * size); env->GetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); int *rgba = ( int *) malloc( sizeof(int) * 4); env->GetIntArrayRegion(rgba_array, 0, 4, (jint *) rgba); const float a = rgba[ 3] * 0.01f; const float b = 1.0f - (rgba[ 3] * 0.01f); for (int i = 0; i < 1; i++) { for (int j = 54; j < size; j = j + 3) { p[j] = (a * p[j]) + (b * ( uint8_t) rgba[2]); p[j + 1] = (a * p[j + 1]) + (b * ( uint8_t) rgba[1]); p[j + 2] = (a * p[j + 2]) + (b * ( uint8_t) rgba[0]); } } env->SetByteArrayRegion(bmp_byte_array, 0, size, ( jbyte *) p); free(rgba); free(p); }
  13. パフォーマンス 端末: Google Pixel 8 CPU: Google Tensor 3 メモリ:

    8GB LPDDR5X RAM 検証画像: • RGB (No Alpha Channel) • 5184 × 3456 (53MB) アプリ: • minifyEnabled • GCC: O2 optimization flag 検証環境
  14. パフォーマンス JVM NDK 1 91ms 46ms 2 65ms 63ms 3

    60ms 75ms 4 59ms 42ms 5 60ms 51ms 6 60ms 45ms 7 59ms 38ms 8 60ms 62ms 9 59ms 45ms 10 60ms 33ms
  15. パフォーマンス NDK: • Average: 50ms • Max: 75ms • Min:

    33ms • Median: 45ms, 46ms JVM: • Average: 63.3ms • Max: 90ms • Min: 59ms • Median: 60ms まとめ
  16. Sample Repository: ❖ Kotlin 2.0 ❖ Multi-Module with NDK ❖

    Jetpack Compose https://github.com/BigBackBoom/hades