LINXS workshop: scikit-image

Introduction to scikit-image Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and
the scikit-image team @EGouillart

Looking at materials from the inside 3-D movie at 1000◦
C! Can we save energy when melting glass?

Getting quantitative data out of images

Complex workﬂows Several steps: pre-processing, region extraction, post-processing several possible
algorithms for each step, parameter selection need for intermediate visalizations A lot of trial and error!

Some problems I want to solve Scientiﬁc / development problems
Finding image processing pipelines for extracting scientiﬁc information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image

Some problems I want to solve Scientiﬁc / development problems
Finding image processing pipelines for extracting scientiﬁc information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image Social and learning problems Working with people with little experience in programming and image processing Helping people to learn programming and image processing How can we have more developers and users of scikit-image?

What is scikit-image? An open-source (BSD) generic image processing library
for the Python language (and NumPy data arrays)

What is scikit-image? An open-source (BSD) generic image processing library
for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve

What is image processing? Manipulations of images to Transform them
for other purposes: color balance, enhancement, filtering, inpainting, ... Extract information from them: classification, measurements... a) flltering b) feature extraction c) segmentation d) measures non-local means denoising total variation denoising blob detection super-pixels skeleton & local diameter particle properties segmentation from markers ridge detection

Traditional image processing: no future?

Traditional image processing: no future? Is deep learning the universal
solution? Image classiﬁcation From Keras blog Image segmentation FCN, from Daniil Pakhomov’s blog

Still room for other tools! Non-natural images

Still room for other tools! Non-natural images Avoid sledgehammer approach
simple operations: image resiz- ing, histogram equalization, ... pre-processing for machine learning

First steps with scikit-image >>> from skimage import data ,
measure >>> im = data.binary_blobs (128, n_dim =3, ← volume_fraction =0.2) >>> im.shape # 3-D array (128 , 128, 128) >>> labels = measure.label(im) >>> type(im), type(labels) # images are numpy arrays (<type ’numpy.ndarray ’>, <type ’numpy.ndarray ’>) >>> labels [0, 0, 0] # pixels are array elements 0 x Visualization with Mayavi Images are NumPy arrays Pixels are array elements Simple API relying on functions new = function(image) Most functions work for 2D & 3D Optional parameters: keyword arguments

The Scientiﬁc Python ecosystem 1 75 ... 32 ... 10
3 ... 2 8 23 ... 36 ... 1 13 ... 0 12 43 ... 6 ... 66 ... 97 1 8 5 ... 73 ... 78 98 ... 9 4 10 ... 55 ... 10 3 ... 9 1 41 ... 98 ... 38 75 ... 6 ... visualization Development environment machine learning image properties object properties image ﬁle(s) array

Scipy lecture notes Train a lot of people: need tools
that scale Several weeks of tutorials! Beginners: the core of Scientiﬁc Python Advanced: learn more tricks Packages: speciﬁc applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years

Manipulating images as numerical (numpy) arrays Pixels are arrays elements
import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x

Manipulating images as numerical (numpy) arrays Pixels are arrays elements
import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d = np.ones((100, 100, 100))

NumPy-native: images as NumPy arrays NumPy arrays as arguments and
outputs >>> from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) <type ’numpy.ndarray ’> >>> camera_array .dtype dtype(’uint8 ’) >>> filtered_array = filters.gaussian(camera_array , ← sigma =5) >>> type( filtered_array ) <type ’numpy.ndarray ’> >>> import matplotlib.pyplot as plt >>> plt.imshow(filtered_array , cmap=’gray ’) x

How we simpliﬁed the API Before 2013 >>> from skimage
import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) Image ... >> camera.max() Image (255 , dtype=uint8) x

Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import
measure >>> labels_2d = measure.label(image_2d) >>> labels_3d = measure.label(image_3d) x

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio
=1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio
=1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x One ﬁlter = one function Use keyword argument for parameter tuning

API of scikit-image skimage ﬁlters restoration segmentation ... denoise_bilateral input
array + optional parameters output (array) submodule module function variables

Consistent names

Consistent names filters . median ( image , selem=None ,
← out=None , . . . ) morphology . binary_erosion ( image , ← selem=None , out=None ) restoration . denoise_bilateral (← image , win_size=None , . . . )

Filtering: transforming image data skimage.filter, skimage.exposure, skimage.restoration

Geometrical transformations skimage.transform scale, zoom, rotate, swirl, warp, ...

Extracting features skimage.feature, skimage.filters

Feature extraction followed by classification Combining scikit-image and scikit-learn Daisy
features Random Forest classifier Back to image processing: Gaussian filtering, thresholding and mathematical morphology.

Measures on images skimage.measure

Segmentation: labelling regions skimage.segmentation

Datasheet Package statistics http://scikit-image.org/ Release 0.14 (1 - 2 release
per year) > 200 contributors, 5-10 maintainers 20000 unique visitors / month Among 1000 best ranked packages on PyPi

Documentation: learning image processing and scikit-image

Docstrings now and then docstring in 2008 Definition: np.diff(a, n=1,
axis =-1) Docstring: Calculate the n-th order discrete difference along ← given axis. x

Docstrings now and then Definition : np.diff(a, n=1, axis =-1)
Docstring: Calculate the n-th order discrete difference along given axis. The first order difference is given by ‘‘out[n] = a[n+1] - a[n]‘‘ along the given axis , higher order differences are calculated by using ‘diff ‘ recursively . Parameters ---------- a : array_like Input array n : int , optional The number of times values are differenced. axis : int , optional The axis along which the difference is taken , default is the last axis. Returns ------- diff : ndarray The ‘n‘ order differences . The shape of the output is the same as ‘a‘ except along ‘axis ‘ where the dimension is smaller by ‘n‘. See Also -------- gradient , ediff1d , cumsum Examples -------- >>> x = np.array ([1, 2, 4, 7, 0]) >>> np.diff(x) much better now! Parameters and their type Suggestion of other functions Simple example

NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo(var1 , var2 , long_var_name
=’hi’) : r"""A one -line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back -ticks , e.g. ‘var ‘. Parameters ---------- var1 : array_like Array_like means all those objects -- lists , nested lists , etc. -- that can be converted to an array. We can also refer to variables like ‘var1 ‘. var2 : int The type above can either refer to an actual Python type (e.g. ‘‘int ‘‘), or describe the type of the variable in more detail , e.g. ‘‘(N,) ndarray ‘‘ or ‘‘array_like ‘‘. Long_variable_name : {’hi ’, ’ho ’}, optional Choices in brackets , default first when optional. Returns ------- type Explanation of anonymous return value of type ‘‘type ‘‘. describe : type Explanation of return value named ‘describe ‘. out : type Explanation of ‘out ‘. Other Parameters ---------------- only_seldom_used_keywords : type Explanation

Documentation at a glance: galleries of examples

Getting started: ﬁnding documentation

Umbrella project: sphinx-gallery

Auto documenting your API with links to examples

Picture denoising

Denoising tomography images In-situ imaging of phase separation in silicate
melts From basic (generic) to advanced (speciﬁc) ﬁlters

Denoising tomography images Histogram of pixel values From basic (generic)
to advanced (speciﬁc) ﬁlters bilateral = restoration.denoise_bilateral(dat) bilateral = restoration.denoise_bilateral(dat, sigma_range← =2.5, sigma_spatial=2) tv = restoration.denoise_tv_chambolle(dat, weight=0.5)

Learning by yourself

Towards a more interactive documentation? https://mybinder.org/v2/gh/scikit-image/skimage-tutorials/master

Towards a more interactive documentation?

Boosted learning: contributing to open-source Learning more about programming Learning
more about image processing

Development process: datasheet Package statistics http://scikit-image.org/ Release 0.14 (1 -
2 release per year) Among 500 best ranked packages on PyPi (> 106 downloads per year) 20000 unique visitors / month

The people

The people 2 4 6 8 10 12 Individual Committer
0.0 0.2 0.4 0.6 0.8 1.0 Commit rate Normalized commit rates since 2014-06 matplotlib numpy pandas scikit-image scikit-learn scipy A quite healthy curve... we can do better! Fernando Perez & Aaron Meurer, Gist 5843625

Infrastructure: how to build your code

More interaction for faster discovery: widgets

More interaction for faster discovery: web applications made easy https://dash.plot.ly/

Keeping interaction easy for large data from joblib import Memory
memory = Memory ( cachedir=’ . / c a c h e d i r ’ , verbose=0) @memory . cache def mem_label ( x ) : r e t u r n measure . label ( x ) @memory . cache def mem_threshold_otsu ( x ) : r e t u r n filters . threshold_otsu ( x ) [ . . . ] val = mem_threshold_otsu ( dat ) objects = dat > val median_dat = mem_median_filter ( dat , 3) val2 = mem_threshold_otsu ( median_dat [ objects ] ) liquid = median_dat > val2 segmentation_result = np . copy ( objects ) . astype ( np . uint8 ) segmentation_result [ liquid ] = 2 aggregates = mem_binary_fill_holes ( objects ) aggregates_ds = np . copy ( aggregates [ : : 4 , : : 4 , : : 4 ] ) cores = mem_binary_erosion ( aggregates_ds , np . ones ((10 , 10 ,← 10) ) )

Massive data processing and parallelization Competitive environment: some other tools
use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

joblib: easy simple parallel computing + lazy re-evaluation import numpy
as np from sklearn . externals . joblib import Parallel , delayed def apply_parallel ( func , data , ∗args , chunk =100, overlap =10, n_jobs=4, ∗∗kwargs ) : ””” Apply a f u n c t i o n i n p a r a l l e l to o v e r l a p p i n g chunks of an a r r a y . j o b l i b i s used f o r p a r a l l e l p r o c e s s i n g . [ . . . ] Examples − − − − − − − − > > > from skimage import data , f i l t e r s > > > c o i n s = data . c o i n s () > > > r e s = a p p l y p a r a l l e l ( f i l t e r s . gaussian , coins , 2) ””” sh0 = data . shape [ 0 ] nb_chunks = sh0 // chunk end_chunk = sh0 % chunk arg_list = [ data [ max (0 , i∗chunk − overlap ) : min (( i+1)∗chunk + overlap , sh0 ) ] f o r i i n range (0 , nb_chunks ) ] i f end_chunk > 0 : arg_list . append ( data[−end_chunk − overlap : ] ) res_list = Parallel ( n_jobs=n_jobs ) ( delayed ( func ) ( sub_im , ∗args , ∗∗kwargs ) f o r sub_im i n arg_list ) output_dtype = res_list [ 0 ] . dtype out_data = np . empty ( data . shape , dtype=output_dtype ) f o r i i n range (1 , nb_chunks ) : out_data [ i∗chunk : ( i+1)∗chunk ] = res_list [ i ] [ overlap : overlap+chunk ] out_data [ : chunk ] = res_list [0][: − overlap ] i f end_chunk > 0 : out_data[−end_chunk : ] = res_list [ −1][ overlap : ] r e t u r n out_data

Experimental chunking and parallelization

Other tools Version control: git/github Debugger: %profile Proﬁler: lineprofiler package
with @profile decorator

A platform to build an ecosystem upon Tool for users,
platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Speciﬁc applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs

dash-canvas demo Package: http://github.com/plotly/dash-canvas Gallery: http://dash-playground.plotly.host/canvas-gallery/

LINXS workshop: scikit-image

LINXS workshop: scikit-image

More Decks by Emmanuelle Gouillart

Other Decks in Science

Featured

Transcript