Machine Learning without the Hype

Machine Learning ̴̴̴̴̴ohne Hype ̴̴̴̴Philipp Krenn̴̴@xeraa

̴̴̴̴̴Developer

Machine Learning is going viral...

❝Using #DeepLearning when all you needed was a few if
statements. #MachineLearning #DataScience❞ —https://twitter.com/randal_olson/status/927157485240311808

Agenda Machine Learning Domain Dataset

Machine Learning

Artiﬁcial Intelligence Machine Learning Deep Learning !

https://blogs.nvidia.com/blog/2016/07/29/whats-diﬀerence-artiﬁcial-intelligence-machine-learning-deep-learning-ai/

General AI Human characteristics

AI Winter

Narrow AI Speciﬁc tasks

Facebook alt="Image may contain: ocean, sky, bridge, cloud, outdoor, water
and nature"

PS: A lot of Chatbots are not AI

❝Alice: I love stateless protocols! Bob: There has to be
something bad about them. Alice: Bad about what?❞ —https://twitter.com/znjp/status/933405548678021120

Machine Learning Algorithms parse data → learn from it →
make a determination or prediction "Trained" machine

❝Learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.❞

❝"Machine Learning is an emerging tech!" Logistic regression 1958 Hidden
Markov Model 1960 Support Vector Machine 1963 k-nearest neighbors 1967 Artiﬁcial Neural Networks 1975 Expectation Maximization 1977 Decision tree 1986 Q-learning 1989 Random forest 1995❞ —https://twitter.com/farbodsaraf/status/977916871000412160

https://twitter.com/ algorithmia/status/ 1009486664933052416

❝But saying "powered by AI" is like saying you’re "powered
by the internet" or "powered by computer code". By itself, it means nothing.❞ —https://twitter.com/jensenharris/status/999119292086960128

Learning Regression Ranking Clustering

https://twitter.com/ShawnWildermuth/status/932724124237123584

For children and machines Watch your language

Statistics 101: Linear Regression

❝We are leveraging machine learning.❞

https://twitter.com/LesGuessing/status/997146590442799105

Supervised Learning Input features and output labels are deﬁned

Unsupervised Learning Unlabeled dataset Discover hidden relationships

https://xkcd.com/882/

Reinforcement Learning Feedback loop to optimize some parameter

Deep Learning Neural network producing a probability vector Lots of
training and parallelization

https://www.youtube.com/watch?v=bxe2T-V8XRs

Access to a unique data set is inherently valuable

❝"What's the diﬀerence between AI and ML?" "It's AI when
you're raising money, it's ML when you're trying to hire people."❞ —https://twitter.com/WAWilsonIV/status/925599712849174528

Domain

Patterns Trend (stationary) Cyclical Seasonal Irregular

Anomaly Point Anomalies Contextual Anomalies Collective Anomalies

Breakouts Mean Shift Ramp Up

Anomaly Detection with Machine Learning Supervised Learning Unsupervised Learning

Examples IT operations: Spiking 500s Security analytics: Unusual DNS activity
Business analytics: Rare log message

Visual Inspection Complex, fast moving data Humans not made to
stare at graphs Easy to miss

Where is the Anomaly?

Static Rules Deﬁnition False positives & negatives Tuning and adjustment

Which threshold?

̴̴̴̴̴̴̴̴̴̴̴̴Machine learning

❝OH: "Do you run any CPU intensive application on your
laptop? Like, machine learning, or Slack?" ! ❞ —https://twitter.com/jpetazzo/status/932464823530430464

Frameworks TensorFlow Keras SciKit ...

How to build ML pipelines? ETL Data storage Optimization algorithms

❝I see you expected clean data. That's cute.❞

Model Baseline: What is normal?

Unsupervised

Evolves "Online" model learns continuously and ages out data

Single Time Series Example: Unusual traffic?

Multiple Time Series Multiple metrics or single metric split up
Each series modeled independently Example: Unusual activity by country?

Dataset

nginx access log { "source": "/home/ec2-user/data/production-4/prod4elasticlog/_logs/access-logs541.log", "beat": { "hostname": "ip-172-31-5-206",
"name": "ip-172-31-5-206", "version": "5.4.0" }, "@timestamp": "2017-03-08T11:44:51.562Z", "read_timestamp": "2017-06-20T08:49:58.538Z", "fileset": { "name": "access", "module": "nginx" },

"nginx": { "access": { "body_sent": { "bytes": "3262" }, "url":
"/assets/blt1afcb054f02e257c/logo-activision.svg", "geoip": { "continent_name": "Asia", "country_iso_code": "IN", "location": { "lat": 20, "lon": 77 } },

"response_code": "200", "user_agent": { "device": "Other", "os_name": "Other", "os": "Other",
"name": "Other" }, "http_version": "1.1", "method": "GET", "remote_ip": "192.19.197.26" } }, "prospector": { "type": "log" } }

Most of the internet went down

PS: When everything is on , nobody cares about your
downloads

Counterfactual Reasoning Which host / IP / ... is involved
in the anomaly

Combine Multiple Models

Correlation ≠ causation

Common problems Correlated features will mess up any model

Common problems Throw out most features if they are just
noise

More features Future predictions

More features Clustering

Conclusion

Agenda Machine Learning Domain Dataset

Rules of Machine Learning: Best Practices for ML Engineering http://martin.zinkevich.org/rules_of_ml/
rules_of_ml.pdf

43 rules Rule #1: Don’t be afraid to launch a
product without machine learning Rule #14: Starting with an interpretable model makes debugging easier Rule #16: Plan to launch and iterate

Machine Learning ̴̴̴̴̴ohne Hype ̴̴̴̴Philipp Krenn̴̴@xeraa

Machine Learning without the Hype

Machine Learning without the Hype

More Decks by Philipp Krenn

Other Decks in Programming

Featured

Transcript