RandomX · Monero's CPU-friendly proof of work

01 · the problem

§1Why Monero needed a new PoW

Every PoW chain eventually faces the ASIC question. Bitcoin embraced them. Monero refused. For five years (2014-2019) Monero played whack-a-mole with the CryptoNight family — tweaking parameters every six months whenever ASICs appeared. It was exhausting, partially effective, and clearly not sustainable. RandomX is the strategic pivot: instead of hiding from ASICs, make the algorithm so hostile to specialization that no ASIC can beat a commodity CPU.

The ASIC pressure

Custom silicon optimized for one hash function (SHA-256, scrypt, etc.) outperforms general-purpose hardware by 100,000× or more. Result: mining centralizes around the few who can afford the silicon.

What broke CryptoNight

CryptoNight tried to be memory-hard. ASICs eventually got built anyway (Bitmain Antminer X3, 2018). The Monero community responded with emergency forks that bricked the ASICs — but the cycle was unwinnable.

The pivot

Tevador, SChernykh, hyc, and others spent ~9 months designing RandomX. The premise: don't try to dodge ASICs — design a PoW where a CPU has every structural advantage and silicon would have to recreate a CPU to compete.

The strategic shift in one sentence

CryptoNight was an obstacle course hoping ASICs couldn't navigate it. RandomX is an art exhibit where the only valid medium happens to be commodity silicon. Anything less general — an ASIC, an FPGA, a GPU — necessarily produces an inferior copy.

02 · the big idea

§2Random programs, not random data

Classical PoW (Bitcoin's SHA-256d, scrypt, etc.) runs the same function on different inputs. RandomX runs different functions on the same input. Every hash attempt generates a fresh ~256-instruction program, JIT-compiles it to native machine code, and executes it on a virtual machine. The space of possible programs is ~2⁵¹² — far too many for an ASIC to pre-bake.

Why "random programs" defeats ASICs

An ASIC achieves its speed by hard-coding the operations it performs. If the operations change every hash, the ASIC's hard-coded circuitry is useless — it would have to implement an instruction decoder, register file, ALU, FPU, branch predictor, and memory subsystem. At which point it has reinvented a CPU — and a less efficient one than what Intel and AMD already ship.

_i(scratchpad, registers) ~thousands of hashes/sec on a CPU → hardware needs to handle any of 2⁵¹² programs EACH HASH = NEW PROGRAM prog #1 IMUL_R r3,r1 FADD f0,a2 prog #2 IXOR_R r5,r2 CBRANCH 0x42 prog #3 FSQRT_R e1 ISTORE L1 prog #4 FMUL_R e3,a1 IADD_RS r7 prog #5 FDIV_M e0,L2 IROR_R r4 prog #N… 2⁵¹² total every hash compiles a fresh ~256-instruction program to native code, runs it, hashes the result

03 · components

§3The five memory regions

RandomX juggles five distinct memory regions, each sized to a different level of the modern memory hierarchy. The layering is deliberate: it forces an implementation to have working DRAM, L3, L2, and L1 caches all firing at once. CPUs have all of these; ASICs don't have them in the same configuration.

32 bytes · changes every 2048 blocks

Key

The seed for everything else. Derived from a "key block" — the most recent block whose height is divisible by 2048. Rotates ~every 2.8 days. When K rotates, everyone has to rebuild the cache and dataset.

Cache

256 MB · per key

The Cache

Generated from K via Argon2d (a memory-hard KDF). Used to derive the much larger Dataset. Light-mode miners and verifiers keep just this 256 MB and regenerate Dataset entries on-demand.

Dataset

2.08 GB · per key

The Dataset

Built from the Cache by running 8 SuperscalarHash functions per entry. Read-only during hashing. Forces DRAM traffic — each hash reads ~16,384 entries (64 bytes each) over its full computation. Fast-mode only.

Scratchpad

2 MB · per hash

The Scratchpad

The VM's working memory. Read-write. Sized to fit in L3 cache. Initialized per-hash from input H via Blake2b → AesGenerator. Split into L1/L2/L3 regions mimicking CPU cache hierarchy.

Registers

~250 bytes · per hash

The VM register file

8 integer (r0-r7), 4 "f" float (f0-f3), 4 "e" float (e0-e3), 4 "a" float address (a0-a3), plus tracking registers. The fast working set, sized to fit comfortably in L1.

The memory hierarchy, drawn

An ASIC that wanted to compete would need 2 GB of fast SRAM (or DRAM with a wide bus), 2 MB of fast L3-equivalent storage, AND all the CPU instructions covered in §5. The economics never close.

04 · the hash flow

§4From input to 256-bit output

One RandomX hash is not one program — it's eight, chained together. Each program reads/writes the scratchpad, then the next program's bytecode is generated from a hash of the previous VM register state. At the end, the entire scratchpad is fingerprinted with AES and combined with the final register file into a Blake2b 256-bit output. That is the value the miner compares to the network difficulty target.

The 8-program chain is what gives RandomX its scale. Each program is ~256 instructions × 2048 iterations of the VM loop, so one hash executes ~4 million instructions before the final fingerprint. That's why even modern CPUs only manage thousands of hashes per second per core — and why the algorithm is so hard to accelerate.

05 · the vm

§5What the virtual machine looks like

RandomX's virtual machine is designed to be a caricature of a modern CPU — it exercises exactly the features that distinguish general-purpose silicon from specialized accelerators. The instruction set is tiny (~30 opcodes), but each opcode is chosen to be something a CPU is already great at and an ASIC would have to expensively replicate.

A snippet of generated bytecode

; randomx program excerpt · 256 such instructions per program · regenerated every hash IADD_RS r3, r5, SHFT 2 ; integer add with shift IXOR_R r1, r0 ; integer xor IMUL_R r7, r2 ; 64-bit multiply FADD_R f1, a3 ; double-precision FP add FSUB_M f0, L1[r4] ; FP subtract from scratchpad L1 FMUL_R e2, a1 ; FP multiply FDIV_M e0, L2[r6] ; FP divide (IEEE 754, exact) FSQRT_R e1 ; FP square root CBRANCH r5, 0x4f, target ; conditional branch — hits branch predictor ISTORE L3[r1], r7 ; scratchpad write ...

What each instruction class buys

Class	Examples	What it forces hardware to have
Integer math	IADD, IMUL, IXOR, IROR	A 64-bit ALU with mult/shift/rotate — every CPU has it; ASICs would need to add it.
Floating point	FADD, FSUB, FMUL, FDIV, FSQRT	IEEE 754 double-precision with correct rounding modes. Killer for GPUs (which favor single-precision) and webassembly (no directed rounding).
Memory R/W	ISTORE, IADD_M, FADD_M	A working cache hierarchy. Reads at L1/L2/L3 latency. Writes that update cached state.
Branches	CBRANCH	A branch predictor. ~1% probability so misprediction is rare but not zero — exercises the prediction logic.
Reciprocal	IMUL_RCP	Multiplication by precomputed reciprocal (avoids slow integer divide). Loads a 64-bit literal into a register.

JIT compilation · the speed unlock

RandomX bytecode is not interpreted — it's JIT-compiled to native machine code for the host CPU on every hash. The reference implementation includes JIT compilers for x86-64, ARM64, and (recently) RISC-V. The compiled code runs at full speed; the interpreter path is the fallback for platforms without a JIT and is ~10× slower. This is why mining is so much faster than verification on platforms with the JIT — and why the algorithm needs a fairly recent CPU to be efficient.

06 · modes

§6Fast mode for mining, light mode for verifying

A subtle but elegant property of RandomX: verification doesn't need to be as expensive as mining. The 2 GB Dataset is only required for fast mode. In light mode, the verifier keeps only the 256 MB Cache and computes Dataset entries on-demand as the VM requests them. Same answer, much less memory — at the cost of being too slow to mine competitively.

Mode	RAM	Use case
Fast mode	~2.08 GB	Mining. The entire Dataset is precomputed and held in RAM. Each hash reads ~16,384 entries directly. ~4-6× faster than light mode.
Light mode	~256 MB	Verification. Only the Cache is held. Dataset entries are regenerated on-demand by running 8 SuperscalarHash invocations from the Cache. Slow, but lets any node verify a block without 2 GB of RAM.

Why this split is brilliant

If fast mode required only 256 MB, embedded devices and old CPUs could mine — but ASICs would also have a much easier target. If light mode required 2 GB, every monerod verifying blocks would need 2 GB just for PoW — pricing out lightweight nodes. Splitting the cost keeps mining hard and verification accessible.

07 · the result

§7Five years in: is it working?

RandomX has been running on Monero mainnet since November 30, 2019. The empirical record:

✓

No ASICs detected

Six years in, no confirmed Monero RandomX ASIC has ever appeared on the market. Compared to the cycle of attacks against CryptoNight every 6-9 months, this is a stark difference.

✓

GPUs are uncompetitive

GPUs underperform CPUs at RandomX by ~3-5×. The combination of integer arithmetic, FP64 with directed rounding, and random branching is exactly the GPU's weak spot. NVIDIA and AMD essentially gave up trying.

✓

Audited four times

Trail of Bits, Kudelski Security, Quarkslab, and the Monero Research Lab have all independently reviewed the algorithm. No critical findings.

The downsides

−

Mineable malware

Because efficient mining needs ≥2 GB of RAM, ironically RandomX is somewhat easier to detect as cryptojacking malware than smaller-memory algorithms. Tools like "RandomX Sniffer" use the 2 GB allocation as a tell.

−

Web mining impossible

Browser sandboxes don't support FP64 directed rounding, and 2 GB allocations are blocked. WebAssembly mining (the Coinhive era) is structurally dead — by design.

−

32-bit can't compete

RandomX requires 64-bit integer mults and 2+ GB virtual address space. Old hardware, embedded chips, and IoT devices are effectively excluded from mining — which the Monero community considers acceptable.