Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking Extension Programming

Rethinking Extension Programming

Invited Keynote. EuroSciPy 2012. Brussels. Talk was recorded, but the video seems to be lost to the sands of time. A shame as I live-coded some LLVM examples. Contact me if its ever found.

David Beazley

August 25, 2012
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. "Much of scientific programming is exploratory in nature, and for

    that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments."
  2. "Much of scientific programming is exploratory in nature, and for

    that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997)
  3. NUMERICAL PYTHON Paul F. Dubois, Konrad Hinsen, and James Hugunin

    Department Editor: Paul F. Dubois [email protected] P)1hon is a small and easy-la-learn hmgl!age with surpris- ing capabilities. It is an interpreted objecl-orienT.ed script- ing language and has a full range of sophisticated features such as first-class functions, garbage collection, and excep- tion handling. Python has properties that m"ke it especially appealing for scientific programming: Python is quite simple and easy to learn, but it is a full and complete language. It is simple to extend Python with your own compiled objects and functions. Python is portable, from Unix to Windows 95 to Linux to Macintosh. Python is free, with no license required e'/en if you make a commercial product out of it. _ Python has a large user-contributed library of "modules" These modules cover a wide variety of needs, such as audio. benchmarks for the basic language and the numerical extension are available on the Python Web site (http://www. python.org). The numerical extension is still in beta test and may therefore change slightly from this description. In par- ticular, the beta-test period is needed to sort out some controversies in naming and coercion rules. But with this tutorial as a start and the latest readme file for the numerical extension, you should be able to start using it. Note that you will need to add the extension to the Python source; every effort is made to keep a "minimal Python" as small as possible, as Python is being used for applications where a small size is important. Python is extremely well suited to the development of programmable ap- plications, as has been advocated on these pages (CiP S:I, 1994, p. 70). It has a scripting language as the user interface and compiled code for the compute-in- tensive portions. Introducing Python Python is an interpreter. You can either enter commands directly into the interpreter Of,more commonly, create a file containing a script. On Unix, you can invoke Python with the script as the first argument, or you can use the llsual trick of starting the script with a comment like this: # !/usr/localibinlpython Then you give execute permission to the script file. When you execute it, the Python interpreter is invoked on the script file itself. Since the above line is a comment as far as Python is concerned, it is then ignored. w ex th by 80 lis so us tw > { A l obj f The fi In m tupl fir 2 COMPUTERS IN PHYSICS, VOL. 10,NO.3. MAYIJUN 1996 © 1996 AMERICAN INSTITUTE OF PHYSICS 0894_1866/96110(3)126216/$10.00
  4. "Imagine a physics program written in Fortran, whose user interface

    is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on."
  5. "Imagine a physics program written in Fortran, whose user interface

    is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on." 1994
  6. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours
  7. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak
  8. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak Unworkable!
  9. 15 of 35 Scriptable Scientific Software [email protected] SWIG (Simplified Wrapper

    and Interface Generator) • SWIG is a compiler that automatically constructs scripting interfaces given ANSI C/C++ declarations. • Can be used to quickly build scripting interfaces from header files. • Avoids the problem of writing wrapper code by hand. • Disclaimer : SWIG is not the only approach. %module SPaSM %{ #include "SPaSM.h" %} void memory(int natoms); void geometry(double ... void processors(int ... void set_boundary_period ... >>> from SPaSM import * >>> memory(20000) >>> geometry(0,0,0,80,80, 160,2.0) ... SWIG SWIG Interface file Python SWIG: C declarations to Python wrappers. (Details don't really matter here)
  10. But, I've long been bothered... What is Swig exactly? -

    A gross hack? - A programming languages project? - A software engineering project? - A good idea? - A bad idea? - Insanity? Nobody on my Ph.D. committee quite knew. (and neither did I)
  11. It's not easy to wrap your brain around it... Example:

    typemaps %define Name ## _input_binary(TYPEMAP, SIZE) %typemap(in,noblock=1,fragment=#SWIG_AsCharPtrAndSize) (TYPEMAP, SIZE) (int res, Char *buf = 0, size_t size = 0, int alloc = 0) { res = SWIG_AsCharPtrAndSize($input, &buf, &size, &alloc); if (!SWIG_IsOK(res)) { %argument_fail(res, "(TYPEMAP, SIZE)", $symname, $argnum); } $1 = ($1_ltype) buf; $2 = ($2_ltype) size - 1; } %typemap(freearg,noblock=1,match="in") (TYPEMAP, SIZE) { if (alloc$argnum == SWIG_NEWOBJ) %delete_array(buf$argnum); } %enddef Don't ask me--I don't understand it either
  12. Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler

    University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07)
  13. Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler

    University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07) e = · · · | (MSG⇥ e) e = · · · | (GSM ⇥ e) E = · · · | (MSG⇥ E) E = · · · | (GSM ⇥ E) ⌥S e : TST ⌥M (MSG⇥ e) : ⇤ ⌥M e : ⇤ ⌥S (GSM ⇥ e) : TST E[MSG n]M ⇥ E[n] E[MSG v]M ⇥ E[MSG (wrong “Non-number”)] v ⌅= n E[MSG⇥1⇥⇥2 ⇥x.e]M ⇥ E[⇥x : ⇤1.MSG⇥2 ((⇥x.e) (GSM ⇥1 x))] x not free in e E[MSG⇥1⇥⇥2 v]M ⇥ E[MSG⇥1⇥⇥2 wrong “Non-procedure”] v ⌅= ⇥x.e E[(GSM n)]S ⇥ E[n] E[(GSM ⇥1⇥⇥2 v)]S ⇥ E[(⇥x. (GSM ⇥2 (v (MSG⇥1 x))))] Figure 3. Extensions to figure 1 to form the simple natural embed- ding e = · · · | (⇥MSN e) e = · · · | (G⇥ e) | (SM ⇥ N e) E = · · · | (⇥MSN E) E = · · · | (G⇥ E) | (SM ⇥ N E) ⌥S e : TST ⌥S (G⇥ e) : TST ⌥S e : TST ⌥M (⇥MSN e) : ⇤ ⌥M e : ⇤ ⌥S (SM ⇥ N e) : ⇤ E[ MSN n]M ⇥ E[n] E[SMN n]S ⇥ E[n] E[⇥1⇥⇥2MSN ⇥x.e]M ⇥ E[⇥x : ⇤1.SM ⇥2 N ((⇥x.e) (SM ⇥1 N x))] E[(SM ⇥1⇥⇥2 N v)]S ⇥ E[(⇥x.(SM ⇥2 N (v (⇥1MSN x))))] E[(G n)]S ⇥ E[n] E[(G v)]S ⇥ E[wrong “Non-number”] (v ⌅= n) E[(G⇥1⇥⇥2 (⇥x.e))]S ⇥ E[(⇥x⇤.(G⇥2 ((⇥x.e)(G⇥1 x⇤))))] E[(G⇥1⇥⇥2 v)]S ⇥ E[wrong “Non-procedure”] (v ⌅= ⇥x.e) Figure 4. Extensions to figure 1 to form the separated-guards natural embedding guarded bounda with unguarded Theorem 3. Fo the following pr (1 (2 where ⇤ is obse In other wor boundaries with program, and th boundaries is eq guards and unc figure 3 is the sa 3.3 A further While the guard mentation based checks. For insta alent to (⇥x : The check perfo check performe the value is com that the convers We can refin essary checks. W written G⇥ + , tha negative guards, Scheme. Their r E[(G+ n)]S E[(G+ v)]S E[(G⇥1⇥⇥2 + v)]S E[(G⇥1⇥⇥2 + v)]S E[(G v)]S E[(G⇥1⇥⇥2 v)]S The function that result from
  14. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations.
  15. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs!
  16. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++.
  17. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++. Scripting is just this add-on... it's just scripting.
  18. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea!
  19. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea! But I'm still bothered!
  20. be fast enough for most such calculations. More computationally intensive

    programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Interpreters will simply
  21. - Paul Dubois (Computers in Physics, Mar. 1997) "Much of

    scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Extensions are still heavily tied to the C compiler - Generation of C code - Invocation of the C compiler (behind scenes) - Shared libraries - Dynamic loading - Other associated issues
  22. Question : Could you get rid of the compiler? (if

    so, what would that look like?)
  23. The Future? This LLVM stuff is kind of fun... ...

    just the sort of thing to keep a physicist hacker busy.
  24. from numba.decorators import jit from numba import int32 @jit(arg_types=[int32[:,:], int32[:,:]],

    ret_type=int32[:,:]) def filter2d(image, filt): M, N = image.shape Mf, Nf = filt.shape Mf2 = Mf // 2 Nf2 = Nf // 2 result = numpy.zeros_like(image) for i in range(Mf2, M - Mf2): for j in range(Nf2, N - Nf2): num = 0.0 for ii in range(Mf): for jj in range(Nf): num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj]) result[i, j] = num return result