Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Django Meta API, A journey of optimization...

The Django Meta API, A journey of optimization and design

Daniel Pyrathon

November 15, 2014
Tweet

More Decks by Daniel Pyrathon

Other Decks in Technology

Transcript

  1. The Meta API A journey of optimization and design Lessons

    learned from a SoC project Daniel Pyrathon @pirosb3
  2. Google Summer of Code • A global program that offers

    students stipends to write code for open source projects • One of Google’s ways to give back to Open Source • A great occasion for you to give back to open source! • May 19 to August 18 • Nearly 200 Open Source organisations • Each organisation has 1 or more proposed projects (you can also propose others) • 5,000 USD for the project Program details
  3. • An internal API, hidden under the _meta object within

    each model • Allows Django to introspect a model’s internals • Makes a lot of Django’s model magic possible • … so basically you don’t know about it, but you may have used it What is the Meta API A very simple definition
  4. Provides metadata about the model: • model name • app

    name • abstract? • proxy? • database table name • table permissions • primary key What is inside the Meta object What is the Meta API
  5. Provides metadata and references to fields and relations in a

    model • Field names • Field instances • Model relations • Field attributes What is inside the Meta object What is the Meta API >>> User._meta.fields (<django.db.models.fields.AutoField: id>, <django.db.models.fields.CharField: password>, <django.db.models.fields.DateTimeField: last_login>, ... ... )
  6. Developers have always used it, even though it’s not officially

    supported What is the Meta API Which apps use the Meta API django-nonrel django-taggit django-rest-framework ..and many more
  7. Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def

    get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def fields(self): 14
  8. Distinction between Fields and Related Objects Complexity Brand Item Item

    has a ForeignKey to Brand 1 Brand has a RelatedObject to Item as a consequence of the relation 2 Related Objects Related objects are objects created as a consequence of a relation of another model with the current model. Fields Any field defined on the model with or without a relation.
  9. • 10 entry-points • 4 cached properties • 6 Separate

    caching systems for API • Distinction between 4 different types of fields: fields, m2m, related_objects, related_m2m • Never been tested Complexity Previous Meta API entry-points
  10. Complexity Difference between properties and concepts All Model Related Objects

    related_objects related_m2m Excludes RO coming from M2M relations Only RO coming from M2M relations All Model fields fields many_to_many Excludes fields that are M2M Only fields that are M2M The decision to split these concepts into multiple properties is only an implementation detail
  11. Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def

    get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): 14
  12. • An official API, that everyone can use without risk

    of breakage. • A fast API, that also Django’s internals can use. • An intuitive API, simple to use and documented. The new Meta API Philosophy
  13. Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..):

    def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7
  14. >>> User._meta.field_names set(['name', 'email', ..]) The new Meta API 3

    Intuitive return types field_names cached properties get_field() >>> User._meta.fields (<django.db.models.fields.AutoField: id>, ..,) >>> Person._meta.get_field('name') <django.db.models.fields.CharField: name>
  15. • 2 Entry-points • 5 Cached properties • Only 1

    cache layer • Distinction between related objects and fields • 46 Test Cases Complexity New Meta API entry-points
  16. get_fields() fields many_to_many related_objects field_names get_field() Every cached property depends

    on get_fields() • Consistency • Maintainability The new Meta API A single generator function: get_fields()
  17. SuperModel AbstractModel Model get_fields() 1 2 3 4 get_fields() needs

    to take into consideration inheritance, model swapping, and proxy models. Calls to get_fields() are entirely recursive The new Meta API An overview of get_fields()
  18. SuperModel AbstractModel Model get_fields() 1 2 3 4 The new

    Meta API Caching layers and recursiveness Caching is computed per each layer recursively. • Less computation per layer • Duplicate data being set • Cache invalidation.. Cache
  19. Related Objects Graph • A graph of connections between models

    • Generates a map between models and connections • Efficient, computed once for everyone and cached • Still really expensive on first lookup: For every model in every field in every app The new Meta API Related objects graph
  20. I am by • Avoids function call overhead • Uses

    internal __dict__ • pip install cached- property (PyDanny) • Has its limitations The new Meta API Cached property
  21. The Meta API If this method gets executed, it must

    be the first ever call to _relation_tree. All other calls return the attribute directly
  22. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel1_meta.relation_tree
  23. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel1_meta.relation_tree
  24. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel4_meta.relation_tree aModel1._meta.relation_tree
  25. Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT

    Cache NewModel RT Cache 1 2 Cache invalidation Relation Tree cache
  26. Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT

    Cache NewModel RT Cache 1 2 3 Invalidation is expensive, but it happens only on bootup, and is the price we pay for less bugs. Cache invalidation
  27. • Memory efficiency the Meta API is at the core

    of Django, therefore it must provide excellent memory management. Immutable data structures allocate exactly the required space, without over-allocating. • Reusability By returning a reference to an immutable data structure, we guarantee that the end-user cannot manipulate the array, and therefore we can safely return the same reference. • Less bug-prone Personal experience here! API consumers will often retain the array as their own and manipulate the contents. Immutability Why immutability
  28. What do we do in the Meta API? • All

    return types are immutable, no copies are returned • All return types are cached • When possible, we use data structures that derive from set and tuple Immutability in the Meta API Immutability How does this impact how Django consumes the API? • Iteration over multiple API calls is done using itertools.chain() • Use generators everywhere, when filtering API results
  29. • Use itertools.chain() when possible to avoid allocating a new

    list • Use generator comprehension to map or filter API results Downfalls of generator expressions: no indexing or multiple iteration. Currently this happens very little.. Immutability Immutability in the Meta API
  30. We do even more! The API consumer should never pay

    the price of immutable internals. You can always make a copy for your own use. And in case you forget, we kindly remind you with an AttributeError Immutability Immutability in the Meta API
  31. • Reduced complexity of the previous API • Tests added

    to previous and current API • Performance increased compared to previous API • 465 commits in my second PR, as of today! • A fully working, refactored API, for what we have today in Django • ..But, not what we may want to have tomorrow The new Meta API What we have up till now 10% Performance increase DjangoBench, median of 1000 runs for each test
  32. Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..):

    def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7
  33. field.editable Field is editable field.concrete Field has a respective db

    column field.is_relation Field has relation with another model field.one_to_many Cardinality 1-N field.many_to_one Cardinality N-1 field.many_to_many Cardinality N-N field.one_to_one Cardinality 1-1 The future Meta API Boolean flags
  34. field.name Queryable name field.hidden The field is used for another

    field’s functionality (ex. GenericForeignKey) field.model The model that contains the field field.referred_model The model that a field points to (in the case the field has a relation) The future Meta API Data flags
  35. # Fetch all relations that go from A to B

    FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.is_relation and f.referred_model == B) # Fetch all fields to show on a form for A (including Fks) FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.editable and (not f.has_relation or f.one_to_many)) # Fetch all fields that have a connected db column FIELDS = (f for f in A._meta.get_fields() if f.concrete) The future Meta API Querying with get_fields() and field flags
  36. Naming things Major changes from yesterday to today 1. Moving

    to a centralized entry-point: get_fields() 2. Moving from flags to bit-fields 3. Making the API even more sparse 4. Going all the way down to 2 flags
  37. Naming things Spotting the pattern in the old API def

    _fill_fields_cache(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def _fill_m2m_cache(self): def get_m2m_with_model(self): def _fill_related_objects_cache(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def _fill_related_many_to_many_cache(self): def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self):
  38. def get_fields(fields, m2m, related_objects, related_m2m, with_models): Naming things Compacting into

    a single get_fields() API • Less redundancy • A refactored version of the past API, nothing more • Some entry-points have unique flags, so generalizing can be very hard
  39. def get_fields(types=RELATED_OBJECTS, opts=INCLUDE_HIDDEN | INCLUDE_PROXY): Naming things Using bit-fields •

    A flexible API • Requires imports • Entirely anti-pythonic • Causes problems with circular imports
  40. def get_fields(pure_data, pure_m2m, pure_virtual, forward_data, forward_m2m, forward_virtual, related_data, related_m2m, related_virual,

    include_hidden, include_proxy, include_concrete): Naming things Making the API even more sparse • Flags are far better: more pythonic and less imports • This matrix can describe exactly what we have now • This matrix may not describe what we want in the future • Field types and options are too sparse to be api parameters.
  41. def get_fields(forward, reverse, include_hidden, include_parents) Naming things Moving to only

    2 main field distinctions • Only separates the main 2 distinction points • The rest of the filtering is done outside the API • Far simpler and easier to maintain • We are not there yet, as this distinction may not exactly be what we want in the future (future ForeignKeys, future Virtual Fields)
  42. An open source project is nothing without it’s community. Please

    give me feedback on Google Group, or IRC. If you are coming at the sprints, and you have some ideas or you want to have a chat, please do so! Naming things Conclusion
  43. Without these people, the project would have not gone so

    far: Russell Keith Magee, Collin Anderson, Tim Graham, Loic Bistuer, Anssi, and many more Without these people, I wouldn’t be speaking here: Mark Tamlyn, Dutch Django association, Ola Sitarska, and many more! Community Daniel Pyrathon @pirosb3 Naming things A huge thanks