Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
lpw-2012
Search
Oleg Komarov
November 24, 2012
Programming
1
330
lpw-2012
Reliable Cron Jobs in Distributed Environment
Oleg Komarov
November 24, 2012
Tweet
Share
More Decks by Oleg Komarov
See All by Oleg Komarov
yapc::eu 2013
komarov
0
180
Exploring Plack Middlewares
komarov
0
150
yapc_eu_2012
komarov
2
510
Other Decks in Programming
See All in Programming
Serving TUIs over SSH with Go
caarlos0
0
650
On-the-fly Suggestions of Rewriting Method Deprecations
ohbarye
3
5.4k
Browser and UI #2 HTML/ARIA
ken7253
2
180
エンジニアが挑む、限界までの越境
nealle
1
330
Beyond_the_Prompt__Evaluating__Testing__and_Securing_LLM_Applications.pdf
meteatamel
0
110
By the way Google Cloud Next 2025に行ってみてどうだった
ymd65536
0
130
ComposeでのPicture in Picture
takathemax
0
140
Lambda(Python)の リファクタリングが好きなんです
komakichi
5
270
generative-ai-use-cases(GenU)の推しポイント ~2025年4月版~
hideg
1
390
Instrumentsを使用した アプリのパフォーマンス向上方法
hinakko
0
250
Cloudflare Workersで進めるリモートMCP活用
syumai
5
450
Bedrock × Confluenceで簡単(?)社内RAG
iharuoru
1
120
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
179
53k
Scaling GitHub
holman
459
140k
Code Review Best Practice
trishagee
68
18k
Visualization
eitanlees
146
16k
Designing for humans not robots
tammielis
253
25k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
[RailsConf 2023] Rails as a piece of cake
palkan
54
5.5k
Designing Experiences People Love
moore
142
24k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Into the Great Unknown - MozCon
thekraken
38
1.8k
Transcript
Reliable Cron Jobs in Distributed Environment Oleg Komarov 2012-11-24 1/26
Presentation available at https://speakerdeck.com/komarov/lpw-2012 http://bit.ly/VLuT6g 2/26
Context 3 independent projects with shared infrastructure • over 30
boxes • over 200 scripts, 30K+ SLOC • packaged in appr. 20 deb-packages 3/26
TL;DR Reliable Cron Jobs in Distributed Environment 4/26
TL;DR Reliable Cron Jobs in Distributed Environment ... are HARD
to get right 4/26
Cron Jobs in a Vacuum • locks • logging and
output • monitoring • profiling 5/26
Logging and Output • log START and FINISH • log
enough details 6/26
Logging and Output • log START and FINISH • log
enough details • use log + STDERR for important things • use MAILTO to catch that output 6/26
Logging and Output • log START and FINISH • log
enough details • use log + STDERR for important things • use MAILTO to catch that output 6/26
Monitoring • be confident that it actually works • it
must not fail when you system fails • have a plan of action 7/26
Monitoring • be confident that it actually works • it
must not fail when you system fails • have a plan of action 7/26
What to monitor • hardware errors • free disk space
• load • crond is alive • age of generated file, queue size, etc. 8/26
Profiling • Does it need 1GB or 10GB? • What
does it take so long to complete? • How many db queries does it run? 9/26
Profiling • Does it need 1GB or 10GB? • What
does it take so long to complete? • How many db queries does it run? Measure and improve 9/26
More to consider • crash-safe • documentation • parallel execution
• resource limits (ulimit/cgroups) 10/26
Deployment Packages 11/26
Deployment Boxes 12/26
Cron Package Just populate my-project-scriptsN.cron.d file 13/26
Cron Package Just populate my-project-scriptsN.cron.d file Don’t write it by
hand, do it automatically 13/26
Cron Package Just populate my-project-scriptsN.cron.d file Don’t write it by
hand, do it automatically Put some METADATA in your scripts 13/26
Metadata =head1 METADATA <crontab> package: scriptsN params: --mod 2 --rem
0 time: */2 * * * * </crontab> <crontab> package: scriptsN params: --mod 2 --rem 1 time: */2 * * * * </crontab> =cut 14/26
Simple Setup As simple as possible: one box per package
15/26
Simple Setup As simple as possible: one box per package
apt-get purge && kill (or wait) && apt-get install 15/26
!%*#$ Back to Earth Network 16/26
With Extra Boxes Now you have some promblems to solve:
• locks • logs • load 17/26
Net::ZooKeeper::Lock Apache ZooKeeperTM is an effort to develop and maintain
an open-source server which enables highly reliable distributed coordination. Net::ZooKeeper::Lock implements distributed locks via ZooKeeper. 18/26
Introducing Switchman https://github.com/komarov/switchman 19/26
Overview 20/26
Configuration Crontabs are installed everywhere, switchman consults with config in
ZooKeeper: { "groups": { "scripts1": "box1", "scripts2": "box1", "scripts3": ["box1", "box2"] } } 21/26
Description switchman --config /how/to/connect/to/zk --group scriptsN -- CMD ARGS 22/26
Description switchman --config /how/to/connect/to/zk --group scriptsN -- CMD ARGS •
checks configuration • acquires a lock • watches configuration for changes • stops execution when it is not allowed anymore 22/26
Description switchman --config /how/to/connect/to/zk --group scriptsN -- CMD ARGS •
checks configuration • acquires a lock • watches configuration for changes • stops execution when it is not allowed anymore Easy to adopt with METADATA 22/26
One Problem Solved • locks • logs • load 23/26
Further Steps See facebook’s Scribe for collecting decentralized logs Resources
reservation and management A good monitoring system 24/26
Thanks! Questions? https://speakerdeck.com/komarov/lpw-2012 http://bit.ly/VLuT6g http://about.me/komarov om 25/26
Bonus Slide Get file age: # in days perl -E
’say -M $ARGV[0]’ /path/to/file # in seconds expr ‘date +%s‘ - ‘date +%s -r /path/to/file‘ Simple local locks: use Pid::File::Flock qw/:auto/; 26/26