Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amanda Sopkin - Randomness in Python: Creating...

Amanda Sopkin - Randomness in Python: Creating Chaos in an Ordered Machine/Controlled Environment

There are many computational needs for randomness--from creating a game to building a simulation involving naturally occurring randomness similar to the physical world. For most purposes using the python math module to create random numbers within a specific range can be done with no further questions, but sometimes we require a more nuanced implementation.

We will look at both pseudo-random number generators, which use statistically repeatable processes to generate seemingly random series and true random number generators, which inject physical processes like atmospheric noise to generate sequences of numbers. We will discuss the benefits and drawbacks of both approaches and common methods of implementing these two types of generators in python.

Finally, we will look at several real applications for randomness and discuss the best method for generating “randomness” in each scenario.

https://us.pycon.org/2018/schedule/presentation/137/

PyCon 2018

May 11, 2018
Tweet

More Decks by PyCon 2018

Other Decks in Programming

Transcript

  1. 1. Determined that user ids were seeded with restart time

    2. Crashed the Hacker News site 3. Predicted restart time 4. Predicted assigned user ids as users logged in 5. Impersonated discovered users @amandasopkin
  2. “...would allow NSA to determine the state of the random

    number generator, and thereby eventually be able to read all data sent over the SSL connection.” DUAL_EC_DRBG Controversy
  3. • 09/2013: One of the purposes of Bullrun is described

    as being "to covertly introduce weaknesses into the encryption standards followed by hardware and software developers around the world." DUAL_EC_DRBG Controversy
  4. • 2004: Dual EC PRNG introduced • 08/2007: Shumow and

    Ferguson present Dual_EC_DRBG flaw at cryptography conference • 11/2007: Schneier bases article in Wired on their findings DUAL_EC_DRBG Controversy
  5. • 09/2013: One of the purposes of Bullrun is described

    as being "to covertly introduce weaknesses into the encryption standards followed by hardware and software developers around the world." • 12/2013: Presidential advisory examines encryption standards • 2014: Standard is removed DUAL_EC_DRBG Controversy
  6. Who did this impact? Microsoft, Google, Apple, McAfee, Docker, IBM,

    Oracle, Cisco, VMWare, Juniper, HP, Red Hat, Samsung, Toshiba, DELL, Ruckus, F5 Networks, Lenovo, Nokia, the RSA BSAFE libraries for Java and C++ and more....
  7. 1. Pass statistical tests of randomness An ideal pseudo random

    number generator should... Monobit Distance Poker or Craps Birthday
  8. 1. Pass statistical tests of randomness 2. Take a long

    time before repeating An ideal pseudo random number generator should... Have a long “period”
  9. 1. Pass statistical tests of randomness 2. Take a long

    time before repeating 3. Execute efficiently An ideal pseudo random number generator should... & Quick Low storage
  10. 1. Pass statistical tests of randomness 2. Take a long

    time before repeating 3. Execute efficiently 4. Be repeatable An ideal pseudo random number generator should...
  11. 1. Pass statistical tests of randomness 2. Take a long

    time before repeating 3. Execute efficiently 4. Be repeatable 5. Be portable An ideal pseudo random number generator should... Can be run on any machine or system
  12. Linear congruential generators Linear congruential generators take the form xk

    = (axk−1 + c) (mod M) where x0 is the seed, the integer M is the largest representable integer, and the period is at most M.
  13. Linear combination generators a = 3 c = 9 m

    = 16 xi = 4394 def lcg(): xi = seed() for i in range(10): xi = (a*xi + c)%m print(xi)
  14. Mid square method generally Start with a 4 digit seed

    Square this value If the result has fewer than 8 digits, add leading 0s Take the middle 4 digits of the result Repeat the sequence
  15. Mid square method generally Start with a 4 digit seed

    Square this value If the result has fewer than 8 digits, add leading 0s 96707556 9834 96707556
  16. Mid square method generally Start with a 4 digit seed

    Square this value If the result has fewer than 8 digits, add leading 0s Take the middle 4 digits of the result Start with a 4 digit seed Square this value If the result has fewer than 8 digits, add leading 0s 9834 96707556 96707556 7075
  17. Mid square method generally Start with a 4 digit seed

    Square this value If the result has fewer than 8 digits, add leading 0s Take the middle 4 digits of the result Repeat the sequence Start with a 4 digit seed Square this value If the result has fewer than 8 digits, add leading 0s 9834 96707556 96707556 7075 50055625
  18. Mid square method seed_number = int(input("Please enter a four digit

    number:\n[####] ")) number = seed_number already_seen = set() counter = 0 while number not in already_seen: counter += 1 already_seen.add(number) number = int(str(number * number).zfill(8)[2:6]) print(f"#{counter}: {number}") print(f"We began with the seed {seed_number}, and" f" we repeated ourselves after {counter} steps" f" with {number}.")
  19. Mid square method Please enter a four digit number: [####]

    5859 #1: 3278 #2: 7452 #3: 5323 #4: 3343 #5: 1756 #6: 835 #7: 6972 #8: 6087 #9: 515 #10: 2652 ....... #59: 24 #60: 5 #61: 0 #62: 0 We began with the seed 5859, and we repeated ourselves after 62 steps with 0.
  20. Most used pseudo random number generator Very long period (the

    Mersenne prime: 219937 − 1) Not cryptographically secure The Mersenne Twister
  21. Predicting the random() module from random import random import matplotlib.pyplot

    as plt def uni(n, m, a, c, seed): sequence = [] Xn = seed for i in range(n): Xn = ((a*Xn + c) % m) sequence.append(Xn/float(m-1)) return(sequence) x = range(1000) y_1 = uni(1000, 2**32, 11695477, 1, datetime.now().microsecond) y_2 = [random() for i in range(1000)] plt.plot(x, y_1, "o", color="blue") plt.show() plt.plot(x, y_2, "o", color="red") plt.show()
  22. The Secrets module Is cryptographically secure Includes ready made “batteries”

    for Users that don’t want to build their own Uses 32 bytes of entropy by default
  23. Source code of Secrets module from random import SystemRandom _sysrand

    = SystemRandom() randbits = _sysrand.getrandbits choice = _sysrand.choice def randbelow(exclusive_upper_bound): return _sysrand._randbelow(exclusive_upper_bound) DEFAULT_ENTROPY = 32 # number of bytes to return by default def token_bytes(nbytes=None): if nbytes is None: nbytes = DEFAULT_ENTROPY return os.urandom(nbytes) def token_hex(nbytes=None): return binascii.hexlify(token_bytes(nbytes)).decode('ascii') def token_urlsafe(nbytes=None): tok = token_bytes(nbytes) return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')
  24. SystemRandom Uses OS as a source of randomness Not available

    on all systems Does not rely on software states Sequences are not repeatable
  25. /dev/urandom Will not block without sufficient entropy Relies on “the

    kernel entropy pool” Faster than /dev/random Theoretically vulnerable to attack
  26. Using the secrets module to get tokens import secrets token1

    = secrets.token_hex(16) token2 = secrets.token_hex(10) print(token1) print(token2) d2bdc979d5ecec0dccf67854459c5284 584d93ac921d3c74be9c
  27. Using the secrets module for password generation import secrets import

    string alphabet = string.ascii_letters + string.digits password = ''.join(secrets.choice(alphabet) for i in range(10)) print(password) i3OFMKPr8q
  28. Python’s “nuclear reactor” of Randomness "...folks really are better off

    learning to use things like cryptography.io for security sensitive software, so this change is just about harm mitigation given that it's inevitable that a non-trivial proportion of the millions of current and future Python developers won't do that."
  29. Is very important for security Difficult to truly achieve Can

    be simulated Randomness... @amandasopkin
  30. Sources: • Icons taken from flaticon.com • https://crypto.stackexchange.com/questions/51232/using- 32-hexadecimal-digits-vs-ascii-equivalent-16-character- password

    • https://dev.to/walker/pseudo-random-numbers-in-python-f rom-arithmetic-to-probability-distributions • Wired Magazine • The Washington Post • NYT • Dilbert