Random Numbers in Programming: PRNG, CSPRNG, and When It Matters

9 min2026年6月5日

What "Random" Means in Computing

Computers are deterministic machines. Given the same input, a CPU will always produce the same output — that is the entire point of a processor. So how does a deterministic machine produce random numbers? The short answer: it fakes it (pseudo-random number generators) or it measures something physical that is genuinely unpredictable (true random number generators).

True randomness in computing comes from entropy sources — physical phenomena that are unpredictable at the quantum or chaotic level. Your operating system collects entropy from hardware interrupt timing, mouse movements, disk seek times, thermal noise in circuits, and dedicated hardware random number generators (Intel's RDRAND instruction, for example). This entropy is pooled and used to seed algorithms that stretch a small amount of true randomness into a large stream of usable random bits.

The distinction matters because "random-looking" is not the same as "unpredictable." A sequence can pass every statistical test for randomness — uniform distribution, no serial correlation, no detectable pattern — while still being entirely predictable if you know the algorithm and its internal state. For games and simulations, statistical randomness is enough. For security, you need unpredictability even to an adversary who knows the algorithm.

Most programming languages give you two APIs: a fast pseudo-random generator (Math.random(), random.random(), rand()) and a cryptographically secure one (crypto.getRandomValues(), os.urandom(), /dev/urandom). Understanding when each is appropriate is the difference between a working simulation and a security vulnerability.

PRNG: How Math.random() Works

A pseudo-random number generator (PRNG) is a mathematical function that takes an internal state (the "seed") and produces a sequence of numbers that appear random but are entirely determined by that seed. The same seed always produces the same sequence. The algorithm updates its internal state with each call, producing a new output and a new state.

Classic PRNG algorithms include Linear Congruential Generators (LCG), which compute next = (a * current + c) mod m. These are fast (one multiply, one add, one modulo) but have well-known weaknesses: low-order bits cycle with short periods, consecutive values fall on hyperplanes in multi-dimensional space (the Marsaglia effect), and the period is at most m. Modern browsers use xorshift128+ or similar algorithms for Math.random(), which have better statistical properties and periods of 2^128.

Every PRNG has a period — the length before the sequence repeats. For a 32-bit LCG, the period is at most 2^32 (about 4 billion values). For xorshift128+, it's 2^128 (a number so large it exceeds the atoms in the observable universe). But period alone doesn't make a PRNG secure. xorshift128+ has excellent statistical properties and a huge period, but given a few consecutive outputs, an attacker can reconstruct the internal state and predict all future outputs.

The key limitation: PRNGs are deterministic. If you can observe enough outputs, you can reverse-engineer the internal state. For Math.random() in V8 (Chrome/Node.js), this requires observing only a few outputs. Researchers have published algorithms that recover the xorshift128+ state from as few as 3-4 consecutive outputs of Math.random(). This means any value derived from Math.random() — session tokens, random IDs, temporary passwords — is predictable to a motivated attacker.

CSPRNG: Cryptographically Secure Randomness

A cryptographically secure PRNG (CSPRNG) has one additional guarantee beyond statistical randomness: given any number of previous outputs, it is computationally infeasible to predict the next output. "Computationally infeasible" means that even with all the computing power on Earth, predicting the next bit would take longer than the age of the universe. This property is what separates security-grade randomness from simulation-grade randomness.

CSPRNGs work by combining true entropy from hardware sources with cryptographic algorithms (typically AES in counter mode, ChaCha20, or SHA-based constructions). The OS kernel maintains an entropy pool fed by hardware events. When you call crypto.getRandomValues() in JavaScript or os.urandom() in Python, the kernel extracts bits from this pool and stretches them through a cryptographic function that makes the output indistinguishable from true randomness even to an adversary who knows the algorithm.

On Linux, /dev/urandom uses the ChaCha20 cipher seeded from the kernel entropy pool. On Windows, BCryptGenRandom uses AES-CTR-DRBG. On macOS, /dev/urandom uses Fortuna with SHA-256. All of these are considered safe for security-critical applications. The old debate about /dev/random vs /dev/urandom (blocking vs non-blocking) is settled: /dev/urandom is safe for all cryptographic purposes on modern kernels after initial boot seeding.

In Node.js and browsers, crypto.getRandomValues() fills a typed array with cryptographically secure random bytes. In Node.js, crypto.randomBytes(n) returns a Buffer of n random bytes. These are backed by the OS CSPRNG and are safe for generating passwords, tokens, encryption keys, and session identifiers. The performance cost is higher than Math.random() (typically 10-50x slower), but for security-sensitive values you generate hundreds or thousands, not billions.

When PRNG Is Fine

For games and entertainment, Math.random() is perfectly appropriate. Procedural terrain generation, enemy spawn locations, loot drops, dice rolls in a board game app, card shuffles for solitaire — none of these need to resist a determined attacker. Statistical randomness is enough. Players shouldn't notice patterns, but if a sufficiently motivated player reverse-engineers the PRNG state to predict the next card, the worst outcome is they cheat at single-player solitaire.

Monte Carlo simulations and scientific computing often prefer seeded PRNGs specifically because they are reproducible. If your simulation produces an interesting result, you need to reproduce it. A seeded PRNG with a recorded seed gives you exactly that: deterministic replay. Research papers report their PRNG seed so reviewers can verify results. The Mersenne Twister (MT19937) is the standard choice here — not because it's secure (it's not), but because its 2^19937 period and excellent equidistribution properties make it suitable for statistical sampling.

UI randomness — shuffling a playlist, randomizing the order of quiz questions, choosing a random background color, A/B test bucket assignment — all fine with Math.random(). The "attacker" here is a user who might notice non-randomness, and PRNGs pass that bar easily. A/B testing frameworks intentionally use deterministic hashing (user_id + experiment_name → bucket) so the same user always sees the same variant, which is reproducibility by design.

Performance-critical code that needs billions of random values — physics simulations, procedural generation in real-time graphics, randomized algorithms like quicksort pivot selection or randomized data structures like skip lists — benefits from the speed of hardware-friendly PRNGs. xoshiro256** generates a random 64-bit value in about 1 nanosecond. A CSPRNG would be 50x slower and provide no benefit in these contexts.

When You Need CSPRNG

Passwords and authentication tokens must be generated with a CSPRNG. If an attacker can predict the next password-reset token your server will issue, they can reset any user's password. This is not theoretical — real attacks have exploited predictable token generation. In 2012, researchers demonstrated predicting PHP session IDs generated with non-cryptographic randomness, allowing session hijacking.

UUIDs used as unguessable identifiers (API keys, invitation links, file access tokens) must use crypto-random generation. UUIDv4 should be generated with crypto.randomUUID() or by filling 122 random bits from a CSPRNG. If you generate UUIDv4 using Math.random(), the IDs are technically valid UUIDs but are predictable — an attacker who obtains one can compute others.

Encryption keys, initialization vectors (IVs), nonces, and salts all require CSPRNG output. An AES-256 key must have 256 bits of entropy from a CSPRNG. If you "generate" a key using Math.random(), the actual entropy is far less than 256 bits (the PRNG state is typically 128 bits, and an attacker who knows the algorithm can search that space). Similarly, nonces in authenticated encryption must never repeat — using a CSPRNG to generate them provides the statistical guarantee against collision.

Session IDs, CSRF tokens, OAuth state parameters, email verification codes, and any value where guessing it grants access must come from a CSPRNG. The rule is simple: if the security of your system depends on a value being unpredictable, it must be generated cryptographically. When in doubt, use the secure API — the performance difference (microseconds vs nanoseconds per value) is irrelevant for values you generate a few thousand times per day.

Common Mistakes

Modulo bias: generating a random number in a range by computing Math.random() * n and flooring, or random_bytes % n, introduces bias when n doesn't evenly divide the PRNG's output range. For example, if your PRNG outputs values 0-255 and you want 0-99, values 0-55 are slightly more likely than 56-99 (because 256 mod 100 = 56). For small ranges this bias is negligible, but for cryptographic applications or large-scale simulations, use rejection sampling: generate values and discard those outside your range.

Seeding with the current time is a classic error. If an attacker knows approximately when your server started (or when a value was generated), they can try all timestamps in that window. A server that seeds its PRNG with Date.now() at startup gives an attacker a search space of maybe a few thousand milliseconds — trivial to brute-force. Always seed from OS entropy (which is what Math.random() does in modern engines, but custom PRNG libraries may not).

Using Math.random() for security-sensitive values is the most common mistake in web development. Generating temporary passwords, reset tokens, or invitation codes with Math.random() looks correct in testing (the values appear random) but fails in production against an adversary. Static analysis tools like ESLint plugins can flag calls to Math.random() in security-sensitive contexts. Always use crypto.getRandomValues() or crypto.randomUUID() for anything that must be unpredictable.

Another subtle mistake: shuffling with a biased algorithm. The Fisher-Yates shuffle is correct only if you select a random index from the remaining unshuffled elements. A common broken implementation selects from all elements on each step, producing a non-uniform distribution of permutations. Even with a perfect random source, a broken shuffle algorithm makes some orderings more likely than others.

// ❌ WRONG: Modulo bias — values 0-55 are more likely than 56-99
function biasedRandom(max) {
  const byte = crypto.getRandomValues(new Uint8Array(1))[0]; // 0-255
  return byte % max; // biased when max doesn't divide 256
}

// ✅ CORRECT: Rejection sampling eliminates bias
function unbiasedRandom(max) {
  const limit = 256 - (256 % max); // largest multiple of max ≤ 256
  let value;
  do {
    value = crypto.getRandomValues(new Uint8Array(1))[0];
  } while (value >= limit); // reject values that cause bias
  return value % max;
}

// ❌ WRONG: Using Math.random() for a security token
const token = Math.random().toString(36).slice(2); // predictable!

// ✅ CORRECT: Cryptographically secure token
function secureToken(length = 32) {
  const bytes = crypto.getRandomValues(new Uint8Array(length));
  return Array.from(bytes, b => b.toString(16).padStart(2, '0')).join('');
}

// ❌ WRONG: Broken shuffle (not Fisher-Yates)
function brokenShuffle(arr) {
  return arr.map((_, i) => [Math.random(), arr[i]])
    .sort(([a], [b]) => a - b)
    .map(([, v]) => v); // sort-based shuffle has subtle biases
}

// ✅ CORRECT: Fisher-Yates shuffle
function fisherYates(arr) {
  const result = [...arr];
  for (let i = result.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1));
    [result[i], result[j]] = [result[j], result[i]];
  }
  return result;
}

Distributions Beyond Uniform

Most random APIs give you uniform distribution — every value in the range is equally likely. But many real-world phenomena follow other distributions. Human heights are normally distributed (Gaussian bell curve). Time between events is exponentially distributed. Income follows a power law (Pareto distribution). If your simulation assumes uniform randomness where the real phenomenon is Gaussian, your results will be wrong.

Generating a Gaussian (normal) distribution from uniform random values uses the Box-Muller transform: given two uniform random values u1 and u2 in (0,1), compute z = sqrt(-2 * ln(u1)) * cos(2π * u2). The result z follows a standard normal distribution (mean 0, standard deviation 1). Scale and shift for any desired mean and standard deviation: value = mean + z * stddev. This is how most libraries implement random normal generation.

Weighted random selection — choosing from options with different probabilities — requires more thought than a simple modulo. If option A has weight 3, B has weight 1, and C has weight 6, you need A to appear 30% of the time, B 10%, and C 60%. The alias method preprocesses the weights into a table that allows O(1) selection regardless of the number of options. For small option sets, a cumulative distribution approach works: generate a uniform random in [0, total_weight] and walk through the cumulative weights.

Rejection sampling works for any target distribution: generate a uniform random point in a bounding rectangle, accept it if it falls under the target distribution's curve, reject and retry otherwise. This is simple but can be slow if the target distribution is very peaked (most points get rejected). For specific well-known distributions, there are efficient direct algorithms — the Ziggurat method for normal and exponential distributions is about 5x faster than Box-Muller while producing the same statistical output.