Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Paradoxes and theorems every developer should know
Search
Joshua Thijssen
December 08, 2015
1
830
Paradoxes and theorems every developer should know
Joshua Thijssen
December 08, 2015
Tweet
Share
More Decks by Joshua Thijssen
See All by Joshua Thijssen
RAFT: A story on how clusters of computers keep your data in sync
jaytaph
0
30
The first few milliseconds of HTTPS
jaytaph
0
170
Paradoxes and theorems every developer should know
jaytaph
0
220
Paradoxes and theorems every developer should know
jaytaph
0
530
The first few milliseconds of HTTPS - PHPNW16
jaytaph
1
170
compiler_-_php010.pdf
jaytaph
0
80
Paradoxes and theorems every developer should know
jaytaph
0
190
Introduction into interpreters, compilers and JIT
jaytaph
1
220
Are you out of memory, or have plenty to spare?
jaytaph
0
180
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
720
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
Fireside Chat
paigeccino
34
3k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Music & Morning Musume
bryan
46
6.2k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
The Cult of Friendly URLs
andyhume
78
6k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
The Language of Interfaces
destraynor
154
24k
The Art of Programming - Codeland 2020
erikaheidi
52
13k
Making Projects Easy
brettharned
115
5.9k
Transcript
1 Joshua Thijssen jaytaph <?php namespace
Disclaimer: I'm not a (mad) scientist nor a mathematician. 2
Second disclaimer: I will only tell lies 3
German Tank Problem 4
5
5 15
6
6 53 72 8 15
7 k = number of elements m = largest number
72 + (72 / 4) - 1 = 89 8
9 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem
9 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122
9 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271
9 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342
10
10 ➡ Data leakage.
10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc
10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008.
10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008. ➡ Calculate approximations of datasets with (incomplete) information.
➡ Avoid (semi) sequential data to be leaked. ➡ Adding
randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 11
12 Collecting (big) data is easy Analyzing big data is
the hard part.
Confirmation Bias 13
2 4 6 14 Z={…,−2,−1,0,1,2,…}
21% 15
16 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.
< 10% 17
18 coke beer 35 17 If you drink beer then
you must be 18 yrs or older.
18 coke beer 35 17 If you drink beer then
you must be 18 yrs or older.
18 coke beer 35 17 If you drink beer then
you must be 18 yrs or older.
Cognitive Adaption for social exchange 19
hint: Try and place your "technical problem" in a more
social context. 20
BDD 21
22 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.
22 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.
22 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.
TESTING 23
24 ➡ Step 1: Write code ➡ Step 2: Write
tests ➡ Step 3: Profit
public function isLeapYeap($year) { return ($year % 4 == 0);
} 25 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();
public function isLeapYeap($year) { return ($year % 4 == 0);
} 25 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();
public function isLeapYeap($year) { return ($year % 4 == 0);
} 26 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
27 ➡ Tests where written based on actual code. ➡
Tests where written to CONFIRM actual code, not to DISPROVE actual code!
28 TDD
29 ➡ Step 1: Write tests ➡ Step 2: Write
code ➡ Step 3: Profit, as less prone to confirmation bias (as there is nothing to bias!)
Birthday paradox 30
Question: 31 > 50% chance 4 march 18 september 5
december 25 juli 2 februari 9 october
23 people 32
366 persons = 100% 33
Collisions occur more often than you realize 34
Hash collisions 35
16 bits means 300 values before >50% collision probability 36
Watch out for: 37 ➡ Too small hashes. ➡ Unique
data. ➡ Your data might be less "protected" as you might think.
Heisenberg uncertainty principle 38
It's not about star trek (heisenberg compensators) 39
nor crystal meth 40
41 x position p momentum (mass x velocity) ħ 0.0000000000000000000000000000000001054571800
(1.054571800E-34)
The more precise you know one property, the less you
know the other. 42
This is NOT about observing! 43
Observer effect 44 heisenbug
It's about trade-offs 45
Benford's law 46
Numbers beginning with 1 are more common than numbers beginning
with 9. 47
Default behavior for natural numbers. 48
49
find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 50
find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 50 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9
51
Bayesian filtering 52
What's the probability of an event, based on conditions that
might be related to the event. 53
What is the chance that a message is spam when
it contains certain words? 54
55 P(A|B) P(A) P(B) P(B|A) Probability event A, if event
B (conditional) Probability event A Probability event B Probability event B, if event A
56 ➡ Figure out the probability a {mail, tweet, comment,
review} is {spam, negative} etc.
➡ 10 out of 50 comments are "negative". ➡ 25
out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 57
58 negative "horrible" 10 comments 25 comments 8 comments
59
60 ➡ More words? ➡ Complex algorithm, ➡ but, we
can assume that words are not independent from eachother ➡ Naive Bayes approach
61
62 We must know beforehand which comments are negative?
TRAINING SET 63
64 "Your product is horrible and does not work properly.
Also, you suck." "I had a horrible experience with another product. But yours really worked well. Thank you!" Negative: Positive:
$trainingset = [ 'negative' => [ 'count' => 1, 'words'
=> [ 'product' => 1, 'horrible' => 1, 'properly' => 1, 'suck' => 1, ], ], 'positive' => [ 'count' => 1, 'words' => [ 'horrible' => 1, 'experience' => 1, 'product' => 1, 'thank' => 1, ], ], ]; 65
66 $trainingset = [ 'negative' => [ 'count' => 631,
'words' => [ 'product' => 521, 'horrible' => 52, 'properly' => 36, 'suck' => 272, ], ], 'positive' => [ 'count' => 1263, 'words' => [ 'horrible' => 62, 'experience' => 16, 'product' => 311, 'great' => 363 'thank' => 63, ], ], ];
67 ➡ You might want to filter stop-words first. ➡
You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.
➡ Collaborative filtering (mahout): ➡ If user likes product A,
B and C, what is the chance that they like product D? 68
69 Mess up your (training) data, and nothing can save
you (except a training set reboot)
➡ Binomial probability 70
71 ➡ 30% change of acceptance for CFP ➡ 5
CFP's
71 ➡ 30% change of acceptance for CFP ➡ 5
CFP's 1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832 83% on getting selected at least once!
Ockham's Razor 72
73 Among competing hypotheses, the one with the fewest assumptions
should be selected.
74 82 Everything should be made as simple as possible,
but no simpler.
YAGNI 75
76 Actually, ➡ The principle of plurality Plurality should not
be posited with necessity. ➡ The principle of parsimony It is pointless to do more with what is done with less.
➡ Every element you add needs: design, development, maintenance, connectivity,
support, etc etc. ➡ When "adding" elements, you are not adding, you are multiplying! 77
78 Food for thought: Would Ockham accept a Service Oriented
Architecture?
http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 79
80 Find me on twitter: @jaytaph Find me for development
and training: www.noxlogic.nl / www.techademy.nl Find me on email:
[email protected]
Find me for blogs: www.adayinthelifeof.nl