Mixing a persistence cocktail

Mixing a persistence cocktail Adam Keys RailsConf 2011 @therealadam http://therealadam.com

Imagine a world

The story of scaling Let’s start by taking a whirlwind
tour through most of the other talks you’ve heard on scaling in the past.

Grow ‗ Prototype it The ﬁrst chapter in your application’s
scaling story is…building it. You’ll probably end up using something with ad-hoc queries. Something like MySQL, PostgreSQL, or MongoDB.

class User def self.by_email(address) where(:email => address) end end

Grow ‗ Ship it! This is important! You have to
ship your app, and then do a bunch of business-y and lean/ agile stuff to get it to stick before you even need to think about scaling.

cap deploy production

Grow ‗ Ad-hoc queries → indices Your application starts to
get some attention and your scaling story begins. Some kinds of data start to grow really quickly. You ﬁnd hotspots, track down the queries that aren’t so fast, and you add an index in your database to make that query faster. Rinse, repeat. Most apps don’t go past this point.

add_index :users, :email

Grow ‗ Database indices → specialized storage If your app
continues to grow, you’ll ﬁnd yourself extracting indexes into specialized storage. You’ll replace indexed queries (or queries that are difficult to index) with caches like Memcache, Redis, or maybe something custom.

class User after_commit :update_cache def update_cache # Danger: stores a
# Marshal'd AR object, which is tricky CACHE.set("user:email:#{email}", self) end def self.by_email(address) CACHE.get("user:#{id}:email") end end

Grow ‗ Slow processes → queues and workers At some
point, your application will accrue more work than you’d really want to do during a transaction/request. That’s when you want to go slightly asynchronous, queue work up and process it out-of-process. Delayed Job is a great way to start, then move up to something like Resque or Rabbit MQ.

class User after_commit :update_recommendations def update_recommendations Resque.enqueue( User::GenerateRecommendations, self.id )
end end

class User class GenerateRecommendations def perform(id) # ... end end
end

Grow ‗ Relax consistency requirements As you start to embrace
caches and queues, it’s important to start thinking about the observability of your data. Depending on how you read and write data, you can get confusing results depending on timing. The good news is, lots of distributed databases are explicit about this, so it’s easy to read about. The bad news is, it’s a bit of brain-bender.

class User def update_cache CASSANDRA.insert( :UserEmails, email, attributes, :consistency =>
ONE ) end def self.by_email attrs = CASSANDRA.get( :UserEmails, email, :consistency => QUORUM ) end end

Integrating that hot, new database That’s what mixing a persistence
cocktail looks like when you’re talking about it over beers. But how does it work out in practice?

Integrate ‘ Client libraries After you’ve selected a database for
your second database, you need a gem to connect to it. This is usually a pretty simple task; there’s usually a widely preferred library for your database. The main caveat is if you’re not on MRI or you’re trying to go non-blocking; in this case, you’ll want to take more care to in selecting a library.

Integrate ‘ Configuration Once you’ve made your selection, you’ll end
up inventing a configuration file for it. Mostly I see people doing something like `database.yml`, but plain-old-Ruby works great too. Make sure you make it easy to configure different environments. Don’t worry about avoiding global references to connections, it hasn’t seemed to hurt us on Gowalla.

$FANTASTIC = case Rails.env when 'development' Fantastic.new('localhost') when 'test' Fantastic::Mock.new
when 'staging' Fantastic.new('staging.myapp.com') when 'production' Fantastic.new('fantastic-master.myapp.com') end

development: localhost staging: staging.myapp.com production: fantastic-master.myapp.com

Integrate ‘ Application code Once your connection is conﬁgured, it’s
time to use it in application code. We’ve been using direct access from a global variable in Gowalla and it hasn’t bit us. I’d like to see us adopt something like redis-objects, toystore, etc. to do domain modeling against our uses of Memcached, Redis, etc. but it’s deﬁnitely not something that is holding us back.

class UsersController def show email = params[:email] @user = $FANTASTIC.get("user:email:#
{email}") # ... end end

class UsersController def show email = params[:email] @user = User.by_email(email)
# ... end end

Integrate ‘ Deployment Once you’ve got a feature coded up
using your new database it’s time to deploy. Get your ops person to set up the database, ﬁgure out exactly which steps you need to roll the new code out in production, and then go for it.

Integrate ‘ Data Migration Once you’re in production for a
while, it will end up that you need to rejigger how your data is stored. Absent migrations ala AR, you’ve got a couple options. One is read-repair, where you make your application code resilient to different versions of a data structure and only update it on writes. Another is to version the key you’re storing data with and increment the key version when you change the structure.

# Versioned keys class User def by_email CACHE.get("user:#{id}:email/2") end def
update_cache # Danger: stores a Marshal'd AR object, # which is tricky CACHE.set("user:email:#{email}/2", self) end end

# Read-repair class User # Suppose we added a UserEmailPreference
# domain object... def by_email obj = CACHE.get("user:#{id}:email") if obj.has_key?(:receive_friend_notice?) UserEmailPreference.from_hash(obj) else # Migrate on every read UserEmailPreference.migrate(obj) end end

def update_cache # Only update the cache on writes CACHE.set(
"user:email:#{email}", user.email_preference ) end end

Overcoming THE FEAR So that’s the tactical level. But there’s
another level of tactics that’s really important as you’re mixing your persistence cocktail. It’s easy to develop anxiety as you get close to deploying your new database. There’s so much to go wrong, so much uncertainty. You need the tactics that allow you to overcome THE FEAR.

Fear ☢ Training wheels First off, give yourself a project
with extremely low stakes. New features, or features you’re not sure you always need work great. The important part is that it’s something with low risk. Low risk means you can push the envelope a bit, which is exactly what adding a new database involves.

Fear ☢ Instrument and log everything Once you’ve got your
training wheels in place, you need to know how things are working. You need numbers on how often things happen and how much time it takes when they do happen. Use Scout, RPM, or log inspection for this. You’ll also want to log things you’re unsure about. Log profusely, and get handy with grep, sed, and awk for digesting those logs.

class InstrumentedRedis def initialize(connection) self.connection = connection end def method_missing(command,
*args, &block) ActiveSupport::Notifications.instrument( "request.redis", :command => command, :args => args ) do connection.send(command, *args, &block) end end attr_accessor :connection end

class RedisInstrumenter < ActiveSupport::LogSubscriber def request(event) return unless logger.debug? name
= "%s (%.1fms)" % ["Redis", event.duration] command = event.payload[:command] args = event.payload[:args].inspect debug " #{color(name, RED, true)} [ #{command} #{args} ]" end end

REDIS_CONFIG = config_for('redis.yml') connection = Redis.new( :host => REDIS_CONFIG["server"] )
REDIS = if Rails.env.development? RedisInstrumenter.attach_to :redis InstrumentedRedis.new(connection) else connection end

class User def recommendations recommender.for(self) rescue RecommendationError => e backtrace
= Rails. backtrace_cleaner. clean(e.backtrace). take(5). join(' | ') logger.warn("Recommendation error: you should look into it") logger.warn(e.message) logger.warn(backtrace) end end

Fear ☢ Feature switches, progressive rollout Sometimes, adding new stuff
doesn’t work out so well. In this case, you want the ability to turn features on and off willy-nilly. Feature toggles make this really easy. Branching in code isn’t the prettiest thing, but it’s a great safety net. The other great thing about toggles is you can roll out to more and more users, making it easy to ease a feature in, rather than hoping it works the ﬁrst time.

gem 'rollout', :version => '~0.3.0' $feature = Rollout.new(REDIS)

class User def recommenations if $feature.active?(:user_recommendations, self) recommender.for(self) else []
end end end

u = User.ak $feature.activate_user(:user_recommendations, u) $feature.activate_group (:user_recommendations, :team) $feature.activate_percentage (:user_recommendations,
10) $feature.activate_group (:user_recommendations, :all)

Fear ☢ Double writes and dark reads When you’re ready
to use your new database for critical features, you can ease into it with two techniques. Start off by writing data to your existing database _and_ the new one. Once you’ve got the new writes debugged and working, start doing reads from it but discarding the results. Debug, optimize, and remove the double write, only using the new system. Success!

class User after_commit :write_to_fantastic def write_to_fantastic if $feature.active(:fantastic, :all) $FANTASTIC.set(self.key,
self) end end end

class UsersController after_filter :read_fantastic, :only => [:show, :index] def read_fantastic
if $feature.active?(:fantastic_read, @user) User.from_fantastic(@user.id) end end # Now watch your performance metrics # and see how your throughput and # latency change. If all goes well, # turn it on for more users and see # if you can break it. end

Fear ☢ Everyday I iterate The most important tool, of
course, is iteration. Depending on the scope of your project, it could be weeks or month before the new thing is the thing. Everyday, move the ball forward. Everyday, make it better. Everyday, ﬁgure out how to make the next step without shooting yourself in the foot. Everyday, deliver business value.

THE FEAR Training wheels Feature ﬂippers Dark reads, double writes
Iterate, a lot Instrument and log everything Integrate Conﬁguration Deployment Client libraries Application code Data migration Grow 1. Prototype it 2. Ship it 3. Convert ad-hoc queries to indexes 5. Queue it, work it 4. Extract indexes into other systems 6. Go asynchronous 7. Relax consistency So here’s your map. As you can see, it’s all interconnected. That’s the way of things. I think it’s neat.

OMG THANKS GUYS!

Mixing a persistence cocktail

Mixing a persistence cocktail

More Decks by Adam Keys

Other Decks in Programming

Featured

Transcript