PyParis 2018: exploring image processing pipelines

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash
A bag of tricks for processing images faster Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart

From images to science ? courtesy F. Beaugnon

A typical pipeline

A typical pipeline How to discover & select the diﬀerent
algorithms? How to iterate quickly towards a satisfying result? How to verify processing results?

Introducing scikit-image A NumPy-ic image processing library for science >>>
from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) <type ’numpy.ndarray ’> >>> camera_array .dtype dtype(’uint8 ’) >>> filtered_array = filters.gaussian(camera_array , ← sigma =5) >>> type( filtered_array ) <type ’numpy.ndarray ’> x Submodules correspond to diﬀerent tasks: I/O, ﬁltering, segmentation... Compatible with 2D and 3D images

Documentation at a glance: galleries of examples

Getting started: ﬁnding documentation

Galleries as a sphinx-extension: sphinx-gallery

Auto documenting your API with links to examples

Learning by yourself filters.try_all_threshold

Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes
= np.bincount(labels.ravel ()) sizes [0] = 0 keep_only_large = (sizes > 1000)[labels] x

Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes
= np.bincount(labels.ravel ()) sizes [0] = 0 keep_only_large = (sizes > 1000)[labels] x morphology.remove_small_objects(im)) clear_border, relabel_sequential, find_boundaries, ← join_segmentations

More interaction for faster discovery: widgets

More interaction for faster discovery: web applications made easy https://dash.plot.ly/

More interaction for faster discovery: web applications made easy @app
. callback ( dash . dependencies . Output ( ’ image−seg ’ , ’ f i g u r e ’ ) , [ dash . dependencies . Input ( ’ s l i d e r m i n ’ , ’ v a l u e ’ ) , dash . dependencies . Input ( ’ s l i d e r m a x ’ , ’ v a l u e ’ ) ] ) def update_figure ( v_min , v_max ) : mask = np . zeros ( img . shape , dtype=np . uint8 ) mask [ img < v_min ] = 1 mask [ img > v_max ] = 2 seg = segmentation . random_walker ( img , mask , mode=’← cg mg ’ ) r e t u r n { ’ data ’ : [ go . Heatmap ( z=img , colorscale=’ Greys ’ ) , go . Contour ( z=seg , ncontours=1, contours=d i c t ( start =1.5 , end =1.5 , coloring=’ l i n e s ’ ,) , line=d i c t ( width=3) ) ] }

Keeping interaction easy for large data from joblib import Memory
memory = Memory ( cachedir=’ . / c a c h e d i r ’ , verbose=0) @memory . cache def mem_label ( x ) : r e t u r n measure . label ( x ) @memory . cache def mem_threshold_otsu ( x ) : r e t u r n filters . threshold_otsu ( x ) [ . . . ] val = mem_threshold_otsu ( dat ) objects = dat > val median_dat = mem_median_filter ( dat , 3) val2 = mem_threshold_otsu ( median_dat [ objects ] ) liquid = median_dat > val2 segmentation_result = np . copy ( objects ) . astype ( np . uint8 ) segmentation_result [ liquid ] = 2 aggregates = mem_binary_fill_holes ( objects ) aggregates_ds = np . copy ( aggregates [ : : 4 , : : 4 , : : 4 ] ) cores = mem_binary_erosion ( aggregates_ds , np . ones ((10 , 10 ,← 10) ) )

joblib: easy simple parallel computing + lazy re-evaluation import numpy
as np from sklearn . externals . joblib import Parallel , delayed def apply_parallel ( func , data , ∗args , chunk =100, overlap =10, n_jobs=4, ∗∗kwargs ) : ””” Apply a f u n c t i o n i n p a r a l l e l to o v e r l a p p i n g chunks of an a r r a y . j o b l i b i s used f o r p a r a l l e l p r o c e s s i n g . [ . . . ] Examples − − − − − − − − > > > from skimage import data , f i l t e r s > > > c o i n s = data . c o i n s () > > > r e s = a p p l y p a r a l l e l ( f i l t e r s . gaussian , coins , 2) ””” sh0 = data . shape [ 0 ] nb_chunks = sh0 // chunk end_chunk = sh0 % chunk arg_list = [ data [ max (0 , i∗chunk − overlap ) : min (( i+1)∗chunk + overlap , sh0 ) ] f o r i i n range (0 , nb_chunks ) ] i f end_chunk > 0 : arg_list . append ( data[−end_chunk − overlap : ] ) res_list = Parallel ( n_jobs=n_jobs ) ( delayed ( func ) ( sub_im , ∗args , ∗∗kwargs ) f o r sub_im i n arg_list ) output_dtype = res_list [ 0 ] . dtype out_data = np . empty ( data . shape , dtype=output_dtype ) f o r i i n range (1 , nb_chunks ) : out_data [ i∗chunk : ( i+1)∗chunk ] = res_list [ i ] [ overlap : overlap+chunk ] out_data [ : chunk ] = res_list [0][: − overlap ] i f end_chunk > 0 : out_data[−end_chunk : ] = res_list [ −1][ overlap : ] r e t u r n out_data

Experimental chunking and parallelization

Synchronized matplotlib subplots fig, ax = plt.subplots(1, 3, sharex=True, sharey=True)

Synchronizing mayavi visualization modules mayavi_module.sync_trait(’trait’, other_module)

Conclusions Explore as much as possible Take advantage of documentation
(maybe improve it!) Keep the pipeline interactive Check what you’re doing, use meaningful visualizations

PyParis 2018: exploring image processing pipelines

PyParis 2018: exploring image processing pipelines

Emmanuelle Gouillart

More Decks by Emmanuelle Gouillart

Other Decks in Programming

Featured

Transcript

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash

From images to science ? courtesy F. Beaugnon

A typical pipeline

A typical pipeline How to discover & select the diﬀerent

Introducing scikit-image A NumPy-ic image processing library for science >>>

Documentation at a glance: galleries of examples

Getting started: ﬁnding documentation

Galleries as a sphinx-extension: sphinx-gallery

Auto documenting your API with links to examples

Auto documenting your API with links to examples

Learning by yourself filters.try_all_threshold

Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes

Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes

More interaction for faster discovery: widgets

More interaction for faster discovery: web applications made easy https://dash.plot.ly/

More interaction for faster discovery: web applications made easy @app

Keeping interaction easy for large data from joblib import Memory

joblib: easy simple parallel computing + lazy re-evaluation import numpy

Experimental chunking and parallelization

Synchronized matplotlib subplots fig, ax = plt.subplots(1, 3, sharex=True, sharey=True)

Synchronizing mayavi visualization modules mayavi_module.sync_trait(’trait’, other_module)

Conclusions Explore as much as possible Take advantage of documentation