Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyParis 2018: exploring image processing pipelines

PyParis 2018: exploring image processing pipelines

Avatar for Emmanuelle Gouillart

Emmanuelle Gouillart

November 15, 2018
Tweet

More Decks by Emmanuelle Gouillart

Other Decks in Programming

Transcript

  1. Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash

    A bag of tricks for processing images faster Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart
  2. A typical pipeline How to discover & select the different

    algorithms? How to iterate quickly towards a satisfying result? How to verify processing results?
  3. Introducing scikit-image A NumPy-ic image processing library for science >>>

    from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) <type ’numpy.ndarray ’> >>> camera_array .dtype dtype(’uint8 ’) >>> filtered_array = filters.gaussian(camera_array , ← sigma =5) >>> type( filtered_array ) <type ’numpy.ndarray ’> x Submodules correspond to different tasks: I/O, filtering, segmentation... Compatible with 2D and 3D images
  4. Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes

    = np.bincount(labels.ravel ()) sizes [0] = 0 keep_only_large = (sizes > 1000)[labels] x
  5. Convenience functions: Numpy operations as one-liners labels = measure.label(im) sizes

    = np.bincount(labels.ravel ()) sizes [0] = 0 keep_only_large = (sizes > 1000)[labels] x morphology.remove_small_objects(im)) clear_border, relabel_sequential, find_boundaries, ← join_segmentations
  6. More interaction for faster discovery: web applications made easy @app

    . callback ( dash . dependencies . Output ( ’ image−seg ’ , ’ f i g u r e ’ ) , [ dash . dependencies . Input ( ’ s l i d e r m i n ’ , ’ v a l u e ’ ) , dash . dependencies . Input ( ’ s l i d e r m a x ’ , ’ v a l u e ’ ) ] ) def update_figure ( v_min , v_max ) : mask = np . zeros ( img . shape , dtype=np . uint8 ) mask [ img < v_min ] = 1 mask [ img > v_max ] = 2 seg = segmentation . random_walker ( img , mask , mode=’← cg mg ’ ) r e t u r n { ’ data ’ : [ go . Heatmap ( z=img , colorscale=’ Greys ’ ) , go . Contour ( z=seg , ncontours=1, contours=d i c t ( start =1.5 , end =1.5 , coloring=’ l i n e s ’ ,) , line=d i c t ( width=3) ) ] }
  7. Keeping interaction easy for large data from joblib import Memory

    memory = Memory ( cachedir=’ . / c a c h e d i r ’ , verbose=0) @memory . cache def mem_label ( x ) : r e t u r n measure . label ( x ) @memory . cache def mem_threshold_otsu ( x ) : r e t u r n filters . threshold_otsu ( x ) [ . . . ] val = mem_threshold_otsu ( dat ) objects = dat > val median_dat = mem_median_filter ( dat , 3) val2 = mem_threshold_otsu ( median_dat [ objects ] ) liquid = median_dat > val2 segmentation_result = np . copy ( objects ) . astype ( np . uint8 ) segmentation_result [ liquid ] = 2 aggregates = mem_binary_fill_holes ( objects ) aggregates_ds = np . copy ( aggregates [ : : 4 , : : 4 , : : 4 ] ) cores = mem_binary_erosion ( aggregates_ds , np . ones ((10 , 10 ,← 10) ) )
  8. joblib: easy simple parallel computing + lazy re-evaluation import numpy

    as np from sklearn . externals . joblib import Parallel , delayed def apply_parallel ( func , data , ∗args , chunk =100, overlap =10, n_jobs=4, ∗∗kwargs ) : ””” Apply a f u n c t i o n i n p a r a l l e l to o v e r l a p p i n g chunks of an a r r a y . j o b l i b i s used f o r p a r a l l e l p r o c e s s i n g . [ . . . ] Examples − − − − − − − − > > > from skimage import data , f i l t e r s > > > c o i n s = data . c o i n s () > > > r e s = a p p l y p a r a l l e l ( f i l t e r s . gaussian , coins , 2) ””” sh0 = data . shape [ 0 ] nb_chunks = sh0 // chunk end_chunk = sh0 % chunk arg_list = [ data [ max (0 , i∗chunk − overlap ) : min (( i+1)∗chunk + overlap , sh0 ) ] f o r i i n range (0 , nb_chunks ) ] i f end_chunk > 0 : arg_list . append ( data[−end_chunk − overlap : ] ) res_list = Parallel ( n_jobs=n_jobs ) ( delayed ( func ) ( sub_im , ∗args , ∗∗kwargs ) f o r sub_im i n arg_list ) output_dtype = res_list [ 0 ] . dtype out_data = np . empty ( data . shape , dtype=output_dtype ) f o r i i n range (1 , nb_chunks ) : out_data [ i∗chunk : ( i+1)∗chunk ] = res_list [ i ] [ overlap : overlap+chunk ] out_data [ : chunk ] = res_list [0][: − overlap ] i f end_chunk > 0 : out_data[−end_chunk : ] = res_list [ −1][ overlap : ] r e t u r n out_data
  9. Conclusions Explore as much as possible Take advantage of documentation

    (maybe improve it!) Keep the pipeline interactive Check what you’re doing, use meaningful visualizations