Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Robust, Distributed, and Parallel Processing fo...

Robust, Distributed, and Parallel Processing for Enormous Images Using Supervisor, Node, Flow, Nx, and Evision

What do you use when you process enormous images? Of course, Python, Numpy, and OpenCV will be helpful for it, but don’t you want to speed it up by processing it in a distributed and parallel way? Elixir can do it:

1. You can replace Numpy and OpenCV with Nx and evision.
2. Node and Flow can make the processing distributed and parallel.
3. Supervisor makes it robust for crashing due to consuming much memory.

This presentation will introduce satellite image processing for an information provision system of sediment disasters as an example case study shown at ElixirConf US 2020.

Susumu Yamazaki (ZACKY)

August 31, 2022
Tweet

More Decks by Susumu Yamazaki (ZACKY)

Other Decks in Programming

Transcript

  1. © 2022 Susumu Yamazaki Susumu Yamazaki (ZACKY), University of Kitakyushu,

    Fukuoka, Japan. Robust, Distributed, and Parallel Processing for Enormous Images Using Supervisor, Node, Flow, Nx, and Evision For Remote Sensing by Artificial Satellites
  2. © 2022 Susumu Yamazaki Susumu Yamazaki (ZACKY) • An Associate

    Professor at the University of Kitakyushu. • Current research interests: • System and social implementation using Elixir, Phoenix, Nerves, and Nx. • The satellite image processing system by them. • Bold means today’s topics. • A creator of the Pelemay series. • A co-organizer of ElixirConf JP. About Presenter
  3. © 2022 Susumu Yamazaki To build the satellite image processing

    system • SAR satellites can always observe the Earth’s surface even at night and/or in bad weather due to their principle. • They can identify a 70cm (2 feet) square object on the Earth’s surface. • An image from the satellite is a massive size of tens of thousands of pixels square. • See the movie on YouTube in detail! My motivation
  4. © 2022 Susumu Yamazaki First question for you: What is

    a better programming environment when we process enormous images?
  5. © 2022 Susumu Yamazaki 1. We can use Nx and

    Evision as image processing instead of Numpy and OpenCV.
  6. © 2022 Susumu Yamazaki A multi-dimensional tensors library for Elixir

    • Nx aims to be the foundation of machine learning for Elixir. • It is developed by Dashbit. • It can replace NumPy and TensorFlow. • It has accelerators including EXLA and Torchx. • EXLA uses Google’s XLA (Accelerated Linear Algebra). • Torchx uses LibTorch. Nx Introducing Nx - José Valim | Lambda Days 2021
  7. © 2022 Susumu Yamazaki An OpenCV-Elixir binding • OpenCV is

    an OSS for computer vision. • Evision is developed by Cocoa Xu. • Available at https://github.com/cocoa-xu/ evision • With Integration with Nx • Available modules: •calib3d, core, dnnm, features2d, flann, highgui, imgcodecs, imgproc, ml, photo, stitching, ts, video, videoio • Presentation at ElixirConf EU 2022 by Cocoa Xu will be available on YouTube. Evision
  8. © 2022 Susumu Yamazaki 2. We can use Node and

    Flow to accelerate such processing with distribution and parallelization.
  9. © 2022 Susumu Yamazaki Powerful Distribution Functionality • A node

    is any Erlang runtime system with a name and a cookie. • Nodes that share the same cookie can send messages to one another using Node.spawn_link/2. • Node.spawn_link/2 calls the given function at the specified node. • It also returns a process id of a stab process of the function call. • We can send a message via the stub process id to the actual process at the outer node. Node https://elixirschool.com/en/lessons/advanced/otp_distribution
  10. © 2022 Susumu Yamazaki Computational Flows with Stages • Flow

    is similar to the Enum and Stream modules, although computations will be executed in parallel. • Flow is 2.16x and 2.34x faster than Enum and Stream, respectively, • Shown by the benchmark using the sample code of the word counter in the right document with a sample 2MB text file. Flow https://hexdocs.pm/flow/Flow.html
  11. © 2022 Susumu Yamazaki Proposal: Divide a large image into

    each computer unit to be processed more quickly if we can separate the image processing safely.
  12. © 2022 Susumu Yamazaki Horizontal Image Division See the gist

    by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/7777c34c89ae797c581a946c08cba6c4 Explain Code
  13. © 2022 Susumu Yamazaki Horizontal Image Concatenation See the gist

    by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/c029e2c1999d36f67ac2d408af82bd86 Explain Code
  14. © 2022 Susumu Yamazaki See this later Vertical Image Division

    See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/d3f590b7ce9ba9de1f8653b4fdffbf95
  15. © 2022 Susumu Yamazaki See this later Tiled Image Division

    See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/ab48d2cea7a9aecd8d8cefa00c2b2ba5
  16. © 2022 Susumu Yamazaki Explain Code Parallel Image Processing by

    Flow See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/3a31f9eafa8e168a6be6bfd560981aad
  17. © 2022 Susumu Yamazaki Explain Code Distributed Image Processing by

    Node See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/62c4bf532db7a72260bd90a528c56bcf
  18. © 2022 Susumu Yamazaki Third question: What should we do

    to prevent abnormal termination due to memory shortage and so on in the middle of processing?
  19. © 2022 Susumu Yamazaki 3. We can use Supervisor, which

    makes it robust for crashing due to consuming much memory.
  20. © 2022 Susumu Yamazaki Monitoring Child Processes • A Supervisor

    is a specialized process for monitoring other processes. • It enables us to create fault-tolerant applications by automatically restarting the child processes when they fail. • To use it, create a new project using the mix new command with the --sup option. • Then, the supervisor code application.ex will be generated. • Define child processes as GenServers, and add their module names or tuples with them and configuration options to children in the supervisor code. • Then, the supervisor will start monitoring the children. Supervisor https://elixirschool.com/en/lessons/advanced/otp_supervisors
  21. © 2022 Susumu Yamazaki A technical issue: Evision may abort

    with the entire Elixir terminating abnormally, because it uses NIFs.
  22. © 2022 Susumu Yamazaki When a NIF aborted, it will

    terminate the entire Elixir abnormally, even if a supervisor monitors it. See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/70b2c7e4e9eb2d76c7a04aa39172c54b
  23. © 2022 Susumu Yamazaki Solution: HtPipe Prevents the entire Elixir

    from terminating abnormally by NIFs. By spawning a child Elixir and communicating it with Node. 1. Run the mix project with Node by the elixir --name main_node_name --cookie cookie_name -S mix run. 2. Spawn a child Elixir with System.cmd/3, :os.cmd/1 or :os.cmd/2, with the elixir -- name child_node_name --cookie cookie_name -S mix run command. 3. Spawn a child node process with one of Node.spawn functions with the child_node_name and the process id for a receiver. 4. Send a message to the child node process, if necessary. 5. Receive the results at the receiver process from the child.
  24. © 2022 Susumu Yamazaki Demo HtPipe prevents the entire Elixir

    from terminating abnormally. See the gist by the QR code, then you’ll get the source code.
 Source code is also available at https://gist.github.com/zacky1972/08be6d1da99519da50df2e1642b3d364
  25. © 2022 Susumu Yamazaki HtPipe solves the technical issue that

    Evision may abort with the entire Elixir terminating abnormally.
  26. © 2022 Susumu Yamazaki Yes! This seems a good idea!

    Broadway should be useful for the integration of such image processing.
  27. © 2022 Susumu Yamazaki Broadway should be useful for such

    integration. Why? Because • It can build concurrent and multi-stage data processing pipelines. ➡It enables us to process enormous images in distribution and parallel. • It can make the pipelines robust. ➡It can make them more robust by enhancing its supervisors to prevent the entire Elixir from terminating abnormally by NIFs, similar to HtPipe. Reference: https://github.com/dashbitco/broadway