Immutable Ruby

Immutable Ruby Michael Fairley @michaelfairley

I work at Braintree Payments. We make it super easy
for businesses of any size to take credit card payments online.

We handle payments for thousand of really awesome companies (like
these).

Immutability Immutability, simply put, means that something can't change. In
Ruby, an immutable object would be one that can't me modiﬁed after it's created. Obviously, a program that doesn't change anything isn't particularly useful, but in small pockets, immutability can be a tool for helping you reason about and structure your code. I'm going to tell you some stories of bad code (full of mutable state) that I've written that came back to bite me, and some stories of good code that I've written in an immutable style and how it paid off.

Your code is full of immutability. Make it explicit. Your
programs already contain lots of data that you assume will never change, and I'd encourage you to make it explicit.

class Purchase < ActiveRecord::Base # t.integer :user_id # t.integer :price
# t.integer :item_id end Let's pretend you're running an online store, and you record all of your purchases in a purchase table. You probably never want to update the data in this table. If you were to come in and change the value of the "price" column, it wouldn't actually change how much you charged the users credit card. When you're recording data that reﬂects events that have happened outside of your application (in a 3rd party or in the "real world"), you often want this data to be immutable.

class Purchase < ActiveRecord::Base include Immutable end It would be
nice if we could mix in a module that makes this model immutable.

class Purchase < ActiveRecord::Base include Immutable end module Immutable def
readonly? persisted? end end Luckily, we can make an Immutable module in 5 lines of code.

class Purchase < ActiveRecord::Base include Immutable end module Immutable def
readonly? persisted? end end purchase = Purchase.create(...) purchase.update_attributes(...) #=> ActiveRecord::ReadOnlyRecord And now we can create new records, but can't update or delete existing ones.

REVOKE UPDATE ON purchases FROM app; REVOKE DELETE ON purchases
FROM app; You could even go as far as giving your application a special user in the database and removing the ability to modify or delete your immutable tables at the database level.

# t.integer :item_id # t.string :status end Here, we've added a status ﬁeld that will contain the state of the purchase (processing, shipped, refunded, etc.) and will change, unlike the other ﬁelds.

github.com/JackDanger/ immutable_attributes gem install immutable_attributes But what if you only
want to make certain ﬁelds immutable? There's a gem for that.

# t.integer :item_id # t.string :status attr_immutable :user_id attr_immutable :price attr_immutable :item_id end We can call `attr_immutable` for the ﬁelds...

# t.integer :item_id # t.string :status attr_immutable :user_id attr_immutable :price attr_immutable :item_id end purchase.update_attributes(:price => "9.99") #=> ImmutableAttributeError ...and when we try to modify them, we'll get an error.

In your application too! You application code also has a
bunch of data that you don't really want to change.

def build_url(base, path, params={}) base << "/" unless path.start_with?('/') base
<< path base << "?" + params.to_param if params.present? base end Let's pretend you have a `build_url` method that takes a domain name, a path, and some optional query params, and will build a full URL out of them.

<< path base << "?" + params.to_param if params.present? base end build_url("http://example.com", "blog") #=> "http://example.com/blog" It build simple urls...

<< path base << "?" + params.to_param if params.present? base end build_url("http://example.com", "blog") #=> "http://example.com/blog" build_url("http://example.com", "/photos", :sort => "size") #=> "http://example.com/photos?sort=size" ...and ones that are a little more complex.

def example_url(path, params={}) build_url("http://example.com", path, params) end We then realize
we're using the same base URL a lot, so we add another method that has it hardcoded.

def example_url(path, params={}) build_url("http://example.com", path, params) end example_url("blog") #=> "http://example.com/blog"
example_url("/photos", :sort => "size") #=> "http://example.com/photos?sort=size" And we get the same URLs as before.

ROOT_URL = "http://example.com" def example_url(path, params={}) build_url(ROOT_URL, path, params) end
But we don't like the hardcoded URL in the middle of a method, so we pull it out into a constant (or an env var, or something loaded from a YAML conﬁg ﬁle).

example_url("blog") #=> "http://example.com/blog" The blog URL looks great.

example_url("blog") #=> "http://example.com/blog" example_url("/photos", :sort => "size") #=> "http://example.com/blog/photos?sort=size" But... the photos URL has "blog" in it. Let's try that again...

example_url("blog") #=> "http://example.com/blog" example_url("/photos", :sort => "size") #=> "http://example.com/blog/photos?sort=size" example_url("/photos", :sort => "size") #=> "http://example.com/blog/photos?sort=size/ photos?sort=size" And things don't look great. :-/

DANGER WILL ROBINSON Something has gone very wrong.

def build_url(base, path, params={}) base << "/" unless path.start_with?('/') base << path base << "?" + params.to_param if params.present? base end So what's going on?

def build_url(base, path, params={}) base << "/" unless path.start_with?('/') base << path base << "?" + params.to_param if params.present? base end We now are only ever making a single instance of the string "http://example.com"

def build_url(base, path, params={}) base << "/" unless path.start_with?('/') base << path base << "?" + params.to_param if params.present? base end We then pass this same instance into build_url over and over.

def build_url(base, path, params={}) base << "/" unless path.start_with?('/') base << path base << "?" + params.to_param if params.present? base end And in build_url, these shovel operators are mutating this single instance of the string.

def build_url(base, path, params={}) base += "/" unless path.start_with?('/') base += path base += "?" + params.to_param if params.present? base end The ﬁx is pretty simple: change the shovels to plus-equals, and instead of mutating `base`, we'll allocate a new string with the result.

But, the developer who extracted the constant wasn't looking at (and shouldn't have to look at) `build_url` method to realize that the string would be mutated.

ROOT_URL = "http://example.com".freeze def example_url(path, params={}) build_url(ROOT_URL, path, params) end
def build_url(base, path, params={}) base += "/" unless path.start_with?('/') base += path base += "?" + params.to_param if params.present? base end But, they could have been defensive when we pulled the constant out and frozen us. Doing this would have caused our broken code to immediately throw an exception pointing out the mistake. You probably want to freeze constants like these so you don't unintentionally mutate them.

.deep_freeze with ice_nine There's one gotcha with freeze: it doesn't
freeze the instance variables, collection elements, etc. of the object you freeze. There's an ice_nine gem that adds a `deep_freeze` method that will recursively freeze all of these things.

Values Values are objects who's identity is entirely based on
the data inside of them, rather than some external identity. So, ActiveRecord objects are not values, because even if two distinct AR objects contain the same data, they aren't equal. Their `id`s are what determine their identity. Values are also immutable. You work with values (like numbers and time) every day in Ruby, but you aren't limited to the value objects that Ruby provides you.

Address 210 E 400 S Salt Lake City, UT 84111

Point (10, 20) (7, 12, 3)

URI https://braintreepayments.com/docs/ruby

github.com/tcrayford/Values gem install values There's a great gem for creating
value objects called... Values

Point = Value.new(:x, :y) We create a new Point class
that has an x and y value. `Value` is a lot like `Struct`, except that the resulting objects are immutable.

Point = Value.new(:x, :y) origin = Point.new(0, 0) We make
a new point at the origin.

Point = Value.new(:x, :y) origin = Point.new(0, 0) origin.x #=>
0 origin.y #=> 0 And it's x and y values are both 0.

0 origin.y #=> 0 elsewhere = Point.new(3, 4) We make another point with some different values.

0 origin.y #=> 0 elsewhere = Point.new(3, 4) elsewhere.x #=> 3 elsewhere.y #=> 4 It's values pop right out of it.

0 origin.y #=> 0 elsewhere = Point.new(3, 4) elsewhere.x #=> 3 elsewhere.y #=> 4 elsewhere.x = 10 #=> NoMethodError But, because these are values, we can't change the data.

0 origin.y #=> 0 elsewhere = Point.new(3, 4) elsewhere.x #=> 3 elsewhere.y #=> 4 elsewhere.x = 10 #=> NoMethodError elsewhere == Point.new(3, 4) #=> true And, the equality is based off of the data inside of it, not any kind of external identity (e.g. object_id). And, once you determine that two values are identical, you know they'll always be.

Point < Value.new(:x, :y) def to_s "(#{x}, #{y})" end def
*(scale) Point.new(x * scale, y * scale) end end Point.new(1, 2) * 3 #=> "(3, 6)" Value objects can have "behavior", in the form of convenience methods. These methods can't modify the internal state of the object though.

Deflate bloated models with value objects "Skinny controller, fat model"
is something we hear in Rails-land all the time. Having logic in your models is deﬁnitely better than having logic in your controllers, but now there's a proliferation of apps with "god objects" that have thousands of lines of code and hundreds of methods. (It's often your `User` class, or some other central model central to your domain). Value objects are a natural way to pull logic out of your bloated models.

class User < ActiveRecord::Base # t.text :shipping_street # t.text :shipping_city
# t.text :shipping_state # t.text :shipping_zip_code # ... def calculate_shipping_price some_calculation end end Let's take a look at how we can decompose one of these bloated models by using value objects. We have the canonical bloated model, `User`, and it has attributes for it's shipping address, and a method to calculate the cost of shipping something to this user.

Address = Value.new( :street, :city, :state, :zip_code ) We can
instead make an Address value to store this data and contain some of this behavior.

class User < ActiveRecord::Base composed_of :shipping_address, :class_name => "Address", :mapping
=> [ ["shipping_street", "street"], ["shipping_city", "city"], ["shipping_state", "state"], ["shipping_zip_code", "zip_code"] ] end We used ActiveRecord's composed_of helper to map our database ﬁelds to the value object's ﬁelds.

user.shipping_street = "210 E 400 S" user.shipping_city = "Salt Lake
City" user.shipping_state = "UT" user.shipping_zip_code = "84111" Now, when the ﬁelds are assigned to (from a form, or as it comes out of the database)...

City" user.shipping_state = "UT" user.shipping_zip_code = "84111" user.shipping_address #=> #<Address: 0x007f7fd4a3dee0 @street="210 E 400 S", @city="Salt Lake City", @state="UT", @zip_code="84111"> We can ask for the shipping address, and we'll get out an Address.

City" user.shipping_state = "UT" user.shipping_zip_code = "84111" user.shipping_address #=> #<Address: 0x007f7fd4a3dee0 @street="210 E 400 S", @city="Salt Lake City", @state="UT", @zip_code="84111"> user.shipping_address = Address.new(...) And we can assign a new Address into the ﬁeld.

user.calculate_shipping_price There's some pain around the original User#calculate_shipping_price method.

require 'spec_helper' describe User do # hundreds of other tests
describe '#calculate_shipping_price' do it "calculates the correct price" do user = FactoryGirl.create(:user, :shipping_street => "210 E 400 S", :shipping_city => "Salt Lake City", :shipping_state => "UT", :shipping_zip_code => "84111" ) cost = user.calculate_shipping_price cost.should == "4.55" end end end Testing the version of the method that lives on `User` isn't _too_ difficult, but there are a few unpleasantries.

describe '#calculate_shipping_price' do it "calculates the correct price" do user = FactoryGirl.create(:user, :shipping_street => "210 E 400 S", :shipping_city => "Salt Lake City", :shipping_state => "UT", :shipping_zip_code => "84111" ) cost = user.calculate_shipping_price cost.should == "4.55" end end end We have to include spec_helper, which is going to ﬁre up an entire rails environment and make our test slow to start.

describe '#calculate_shipping_price' do it "calculates the correct price" do user = FactoryGirl.create(:user, :shipping_street => "210 E 400 S", :shipping_city => "Salt Lake City", :shipping_state => "UT", :shipping_zip_code => "84111" ) cost = user.calculate_shipping_price cost.should == "4.55" end end end We're in a massive ﬁle with hundreds of other tests, and we've made it even worse by adding another.

describe '#calculate_shipping_price' do it "calculates the correct price" do user = FactoryGirl.create(:user, :shipping_street => "210 E 400 S", :shipping_city => "Salt Lake City", :shipping_state => "UT", :shipping_zip_code => "84111" ) cost = user.calculate_shipping_price cost.should == "4.55" end end end We have to use FactoryGirl to build up a model, and we have to talk to the database to save it.

user.calculate_shipping_price vs. user.address.calculate_shipping_price But I'm going to propose that this
code, while not as succinct or direct, is much nicer in the long run.

describe Address do describe '#calculate_shipping_price' do it "calculates the correct
price for here" do address = Address.new( :street => "489 Elizabeth Street", :city => "Melbourne", :state => "VIC", :postal_code => "3000" ) cost = address.calculate_price cost.should == "4.55" end end end If we have a separate `Shipping` class, then the tests become a lot nicer. There are no dependencies on external libraries, no special factories, and we end up with both a class and test quite that are small and isolated.

user.calculate_shipping_price vs. user.address.calculate_shipping_price But "nicer tests" isn't a good enough
reason for a change like this. Fortunately, it also makes our application easier to extend.

user.calculate_shipping_price vs. user.address.calculate_shipping_price user.addresses[2].calculate_shipping_price Let's say a user had multiple
address, it's obvious how to make the 2nd version work, but I don't know what I could do to the 1st version that would leave me happy.

user.calculate_shipping_price vs. user.address.calculate_shipping_price user.addresses[2].calculate_shipping_price business.address.calculate_shipping_price Or, if we want to
be able to calculate shipping costs for domain models besides `User`, again, the 2nd version is incredibly easy to extend, but to make the 1st work, we'd probably have to extract some sort of `Shippable` module that gets mixed into both `User` and `Business` and is not at all straightforward to test.

user.calculate_shipping_price vs. user.address.calculate_shipping_price user.addresses[2].calculate_shipping_price business.address.calculate_shipping_price item.calculate_shipping(user.address) item.calculate_shipping(user.addresses[2]) item.calculate_shipping(business.address) And if
you decided you needed to have different shipping prices for different items, you could move the `calculate_shipping_price` to the items, and have the method take an `Address`. And this change is fairly non-invasive because we're passing around value objects rather then full blown models.

user.shipping_street = user.billing_street user.shipping_city = user.billing_city user.shipping_state = user.billing_state user.shipping_zip_code
= user.billing_zip_code One more example of change that the value-based version is resilient to. Let's think about how we would implement the "my shipping address is the same as my billing address" check box. It's pretty ugly to have to assign each of the address ﬁelds individually, and we ever add a new ﬁeld to the addresses, it's unlikely that we'd remember to come update this code.

user.shipping_address = user.billing_address If the addresses are composed values, then
this can just be a single, intentional line of code. If we ever add any more ﬁelds to address, we don't have to remember to update this assignment.

Event Sourcing Next up... event sourcing!

Capture all changes to application state as a sequence of
events Event sourcing is when you capture all changes to an application's state as a sequence of immutable events. This is best explained with an example...

Opened account $1000 Balance: $1000 Bank accounts are perfect for
event sourcing. You open an account and put $1000 in it.

Opened account $1000 Bought conference ticket -$595 Balance: $405 You
buy a conference ticket, and your balance goes down.

Opened account $1000 Bought conference ticket -$595 Paycheck $4000 Balance:
$4405 You get paid; it goes up.

Opened account $1000 Bought conference ticket -$595 Paycheck $4000 Bought
a book -$15 Balance: $4390 You buy a book...

a book -$15 Returned the book $15 Balance: $4405 and return it, and your balance go back to what it was before.

Events Debits and credits (12/6/2012 16:30, "15.00", "Book") The events
in this system are the transactions.

Derived state Balance And the derived state is the balance.

a book -$15 Returned the book $15 Balance: $???? We say that the balance is derived, because if I take it away from you, you can recalculate it from the events (the source of truth).

Can reconstruct past states What was 110's balance 7 days
ago? We can ask questions about the past. To answer this one, we would just look at all the events up until 7 days ago, and we'd have our answer.

Events can be reverted Charge was refunded In an event
sourced system, events are reverted (i.e. inserting an opposite event), not deleted (because they're immutable).

Replay Debug errors & test new code Events can also
be replayed. If there's an error, the banks programmers could grab the event log and replay it up until the point in time where the error occurred, and they'd have the system in the exact state it was in in production.

git There's another event sourced system that most of you
interact with every day: git!

Events Commits Commits are the events.

Derived state Working directory Your working directory is the derived
state.

Can reconstruct past states What did the code look like
after commit a321bd? You can reconstruct past state and ask questions about the past.

Events can be reverted git revert git revert, rather than
deleting a commit, inserts new commits that do the opposite of the one you're reverting.

Replay git rebase And git rebase is a form of
replay.

Family Tree At a previous job at a family history
startup, we built a family tree feature, and we decided to event source all of the modiﬁcations to the family trees on our site. This turned out to be a really good decision.

Safety net We wanted to store the family tree in
a fancy pants graph database, but we didn't trust it (and our administration of it) to not lose our data. We stored the event log in Postgres and the resulting application state in the graph database. If the graph database ever went kaput, we would still have a canonical version of the data in reliable storage.

Audit log Once or twice, someone vandalized one of the
family trees. It was incredibly easy to ﬁnd all of the events that the vandal triggered and call the revert! method on them

Escape hatch We eventually decided to move the family tree
back into Postgres, and rather than having to do a complicated ETL to get the data out of the graph DB and into Postgres, we changed our code to write the computed data into Postgres. We then replayed the entire event log, and our Postgres DB then held all of our data, in the most recent state.

Immutability lets you break the rules There's a bunch of
rules in computer science and software engineering that immutability lets you sidestep.

"There are only two hard problems in Computer Science: cache
invalidation and naming things." There's this famous quote.

"There are only two hard problems in Computer Science: cache
invalidation and naming things." But if your cached data is never going to change, you're never going to have to invalidate it.

Normalization Why do we normalize our databases?

"The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table" So that we won't have to make updates in more than one place. Well... if you're not making updates, then you'll never have to do it in more than one place, and thus normalization isn't necessary.

Thread Safety Thread safety issues are almost entirely caused by
shared mutable state. Immutable objects are automatically thread safe.

Downsides :-( As with everything, there are tradeoffs when you
use immutability.

Performance Due to extra allocations and copying, code that makes
use of immutable data will almost always be slower and use more memory than code that mutates data in-place.

Flexibility You're constraining yourself when you use immutable data. Your
domain, your performance requirements, or libraries you're using might not ﬁt with these constraints.

Ruby Ruby is an incredibly ﬂexible language. Where other languages
let you declare variable as const or ﬁnal, Ruby will gladly let you reach inside objects and change their instance variables, reassign constants, and even unfreeze frozen objects.

Deletion Deletion is a form of mutation, and you almost
always want user generated data to be deletable, meaning you can't cache it forever/normalize it, etc. etc.

Next steps http://goo.gl/Esa7r If you ﬁnd any of these ideas
interesting, I have some pointers to things you can read or watch or explore to learn more. (This link in the bottom right takes you to a page that has links to everything I'm about to mention.

Clojure Haskell Erlang http://goo.gl/Esa7r Learn one of these programming languages.
Immutability is central to all of them, and they make you jump through hoops to change state. You might ﬁnd this impractical for your day to day programming, but learning at least one of them will help you understand immutability more deeply and

Rich Hickey The Value of Values Simple Made Easy The
Database as a Value Persistent Data Structures and Managed References http://goo.gl/Esa7r Rich Hickey is the creator of Clojure, and he has a handful of really good talks centered around immutability.

Value Objects Domain Driven Design c2 wiki http://goo.gl/Esa7r DDD and
the c2 wiki have a lot to say about value objects.

Gary Bernhardt Function Core/Imperative Shell Boundaries http://goo.gl/Esa7r Gary Bernhardt has
an interesting idea on how to structure functional/immutable code and imperative/mutable code in an application together, and he explores this idea in depth in "Boundaries".

Event Sourcing Martin Fowler http://goo.gl/Esa7r Martin Fowler has the canonical
text on event sourcing on his website.

Thanks! @michaelfairley http://goo.gl/Esa7r

Bonus round! Bonus round!

Persistent Data Structures Persistent Data Structures are immutable data structures.
When you "modify" one of them, you actually get a new copy of the data and the original version remains unchanged. "Persistent" here shouldn't be confused with the term that means a database writes to disk, but rather it means that it sticks around.

Hamster github.com/harukizaemon/hamster Hamster is an awesome implementation of PDTs in
Ruby.

foo = Hamster.vector(1, 2, 3) We make a vector (similar
to an array) with 1, 2, and 3 in it.

foo = Hamster.vector(1, 2, 3) foo #=> [1, 2, 3]

bar = foo.add(4) When we add 4 to it, we assign the result of that into bar.

bar = foo.add(4) bar #=> [1, 2, 3, 4] bar now contains 1, 2, 3, 4

bar = foo.add(4) bar #=> [1, 2, 3, 4] foo #=> [1, 2, 3] But food still has 1, 2, 3. It hasn't changed.

bar = foo.add(4) bar #=> [1, 2, 3, 4] foo #=> [1, 2, 3] baz = foo.set(1, 12) And we can "change" one of the elements.

bar = foo.add(4) bar #=> [1, 2, 3, 4] foo #=> [1, 2, 3] baz = foo.set(1, 12) baz #=> [1, 12, 3] baz has the modiﬁcation.

bar = foo.add(4) bar #=> [1, 2, 3, 4] foo #=> [1, 2, 3] baz = foo.set(1, 12) baz #=> [1, 12, 3] foo #=> [1, 2, 3] But foo remains the same.

To help explain how this is useful, the Three Stooges
are going to lend me a hand.

m1 = Movie.new( :name => "Soup to Nuts", :cast =>
Hamster.set(:moe, :shemp, :larry) ) Moe, Shemp, and Larry were in a movie called "Soup to Nuts".

Hamster.set(:moe, :shemp, :larry) ) m2 = Movie.new( :name => "Meet the Baron", :cast => m1.cast.remove(:shemp).add(:curly) ) In a later move, "Meet the Baron", Shemp left, and Curly became the 3rd stooge.

Hamster.set(:moe, :shemp, :larry) ) m2 = Movie.new( :name => "Meet the Baron", :cast => m1.cast.remove(:shemp).add(:curly) ) m3 = Movie.new( :name => "Gold Raiders", :cast => m2.cast.remove(:curly).add(:shemp) ) And then in "Gold Raiders", Shemp came back, and Curly was out again.

m1.cast #=> {:moe, :larry, :shemp} m2.cast #=> {:moe, :larry, :curly}
m3.cast #=> {:moe, :larry, :shemp} If we had been using mutable data structures, these cast lists would've clobbered each other when they were shared between the movies.

"So like, Hamster is just calling .dup a bunch, right?"
Nope! There's some really cool Computer Science going on here.

old = Hamster.vector(1,2,3,4,5,6,7) 1 2 3 4 5 6 7
old (This is an approximation of what's actually happening) Here, we have a vector of the numbers 1 through 7. Their actually stored as the leaves of a tree, and `old` points to the root of this tree.

old = Hamster.vector(1,2,3,4,5,6,7) new = old << 8 1 2
3 4 5 6 7 old When we append 8 on to this vector.

3 4 5 6 7 8 old new We end up with a new tree. But it's not entirely new.

3 4 5 6 7 8 old new All of the nodes in red are shared between both the old and the new version of the vector. This minimizes both the CPU and memory requirements for these data structures (as opposed to .duping them).

Immutable Ruby

Immutable Ruby

More Decks by Michael Fairley

Other Decks in Programming

Featured

Transcript