// schema · randomx · monero pow algorithm

A proof of work where
the work is the CPU itself.

RandomX is Monero's proof-of-work algorithm, activated November 30, 2019, replacing CryptoNight. It's the answer to a question other coins didn't seriously try to solve: "What if instead of fighting ASICs, we designed a PoW where a general-purpose CPU is genuinely the optimal hardware?" The trick is brutal in its simplicity: generate a different program for every hash, then run it through a virtual machine that exercises everything modern CPUs are good at — branch prediction, floating-point, cache hierarchy, out-of-order execution. An ASIC would have to recreate a CPU to compete. So it doesn't.

since nov 30, 2019 ~2.08 GB dataset ~30 instructions jit compiled 4 formal audits
01 · the problem

§1Why Monero needed a new PoW

Every PoW chain eventually faces the ASIC question. Bitcoin embraced them. Monero refused. For five years (2014-2019) Monero played whack-a-mole with the CryptoNight family — tweaking parameters every six months whenever ASICs appeared. It was exhausting, partially effective, and clearly not sustainable. RandomX is the strategic pivot: instead of hiding from ASICs, make the algorithm so hostile to specialization that no ASIC can beat a commodity CPU.

α

The ASIC pressure

Custom silicon optimized for one hash function (SHA-256, scrypt, etc.) outperforms general-purpose hardware by 100,000× or more. Result: mining centralizes around the few who can afford the silicon.

β

What broke CryptoNight

CryptoNight tried to be memory-hard. ASICs eventually got built anyway (Bitmain Antminer X3, 2018). The Monero community responded with emergency forks that bricked the ASICs — but the cycle was unwinnable.

γ

The pivot

Tevador, SChernykh, hyc, and others spent ~9 months designing RandomX. The premise: don't try to dodge ASICs — design a PoW where a CPU has every structural advantage and silicon would have to recreate a CPU to compete.

The strategic shift in one sentence

CryptoNight was an obstacle course hoping ASICs couldn't navigate it. RandomX is an art exhibit where the only valid medium happens to be commodity silicon. Anything less general — an ASIC, an FPGA, a GPU — necessarily produces an inferior copy.

02 · the big idea

§2Random programs, not random data

Classical PoW (Bitcoin's SHA-256d, scrypt, etc.) runs the same function on different inputs. RandomX runs different functions on the same input. Every hash attempt generates a fresh ~256-instruction program, JIT-compiles it to native machine code, and executes it on a virtual machine. The space of possible programs is ~2512 — far too many for an ASIC to pre-bake.

Why "random programs" defeats ASICs

An ASIC achieves its speed by hard-coding the operations it performs. If the operations change every hash, the ASIC's hard-coded circuitry is useless — it would have to implement an instruction decoder, register file, ALU, FPU, branch predictor, and memory subsystem. At which point it has reinvented a CPU — and a less efficient one than what Intel and AMD already ship.

BITCOIN · SHA-256d SAME function SHA-256(SHA-256(block || nonce)) ~2 billion times per second on an ASIC → hardware can hard-wire the entire computation vs MONERO · RandomX DIFFERENT function each time RandomProgrami(scratchpad, registers) ~thousands of hashes/sec on a CPU → hardware needs to handle any of 2⁵¹² programs EACH HASH = NEW PROGRAM prog #1 IMUL_R r3,r1 FADD f0,a2 prog #2 IXOR_R r5,r2 CBRANCH 0x42 prog #3 FSQRT_R e1 ISTORE L1 prog #4 FMUL_R e3,a1 IADD_RS r7 prog #5 FDIV_M e0,L2 IROR_R r4 prog #N… 2⁵¹² total every hash compiles a fresh ~256-instruction program to native code, runs it, hashes the result
03 · components

§3The five memory regions

RandomX juggles five distinct memory regions, each sized to a different level of the modern memory hierarchy. The layering is deliberate: it forces an implementation to have working DRAM, L3, L2, and L1 caches all firing at once. CPUs have all of these; ASICs don't have them in the same configuration.

K
32 bytes · changes every 2048 blocks

Key

The seed for everything else. Derived from a "key block" — the most recent block whose height is divisible by 2048. Rotates ~every 2.8 days. When K rotates, everyone has to rebuild the cache and dataset.

Cache
256 MB · per key

The Cache

Generated from K via Argon2d (a memory-hard KDF). Used to derive the much larger Dataset. Light-mode miners and verifiers keep just this 256 MB and regenerate Dataset entries on-demand.

Dataset
2.08 GB · per key

The Dataset

Built from the Cache by running 8 SuperscalarHash functions per entry. Read-only during hashing. Forces DRAM traffic — each hash reads ~16,384 entries (64 bytes each) over its full computation. Fast-mode only.

Scratchpad
2 MB · per hash

The Scratchpad

The VM's working memory. Read-write. Sized to fit in L3 cache. Initialized per-hash from input H via Blake2b → AesGenerator. Split into L1/L2/L3 regions mimicking CPU cache hierarchy.

Registers
~250 bytes · per hash

The VM register file

8 integer (r0-r7), 4 "f" float (f0-f3), 4 "e" float (e0-e3), 4 "a" float address (a0-a3), plus tracking registers. The fast working set, sized to fit comfortably in L1.

The memory hierarchy, drawn

RANDOMX MEMORY · MAPPED TO CPU CACHE LEVELS DRAM · the 2.08 GB Dataset main memory · ~50-100ns latency · forces DDR access patterns L3 CACHE · 2 MB Scratchpad CPU last-level cache · ~10-15ns latency · the VM's working area L2 CACHE · 256 KB scratchpad region ~3-5ns latency · accessed by ISCRATCH instructions L1 CACHE · 16 KB hot region + registers ~1ns latency · most-frequently accessed scratchpad bytes

An ASIC that wanted to compete would need 2 GB of fast SRAM (or DRAM with a wide bus), 2 MB of fast L3-equivalent storage, AND all the CPU instructions covered in §5. The economics never close.

04 · the hash flow

§4From input to 256-bit output

One RandomX hash is not one program — it's eight, chained together. Each program reads/writes the scratchpad, then the next program's bytecode is generated from a hash of the previous VM register state. At the end, the entire scratchpad is fingerprinted with AES and combined with the final register file into a Blake2b 256-bit output. That is the value the miner compares to the network difficulty target.

INPUT H block_blob || nonce STEP 1 · seed Blake2b-512(H) 64-byte seed → AesGenerator STEP 2 · fill scratchpad AES expand → 2 MB pad throughput ~20 GB/s on AES-NI LOOP · 8 times STEP 3 · gen program ~256 instructions bytecode from AES generator STEP 4 · JIT compile → native x86-64 / ARM64 runs at full CPU speed STEP 5 · execute · 2048 iters VM runs, reads Dataset, mutates Scratchpad STEP 6 · feed forward register_hash = Blake2b(reg_file) seeds the next program's generator FINAL · after 8 programs AES-fingerprint(scratchpad) ⊕ reg_file → Blake2b-256 → 256-bit output · compare to network target

The 8-program chain is what gives RandomX its scale. Each program is ~256 instructions × 2048 iterations of the VM loop, so one hash executes ~4 million instructions before the final fingerprint. That's why even modern CPUs only manage thousands of hashes per second per core — and why the algorithm is so hard to accelerate.

05 · the vm

§5What the virtual machine looks like

RandomX's virtual machine is designed to be a caricature of a modern CPU — it exercises exactly the features that distinguish general-purpose silicon from specialized accelerators. The instruction set is tiny (~30 opcodes), but each opcode is chosen to be something a CPU is already great at and an ASIC would have to expensively replicate.

A snippet of generated bytecode

; randomx program excerpt · 256 such instructions per program · regenerated every hash IADD_RS r3, r5, SHFT 2 ; integer add with shift IXOR_R r1, r0 ; integer xor IMUL_R r7, r2 ; 64-bit multiply FADD_R f1, a3 ; double-precision FP add FSUB_M f0, L1[r4] ; FP subtract from scratchpad L1 FMUL_R e2, a1 ; FP multiply FDIV_M e0, L2[r6] ; FP divide (IEEE 754, exact) FSQRT_R e1 ; FP square root CBRANCH r5, 0x4f, target ; conditional branch — hits branch predictor ISTORE L3[r1], r7 ; scratchpad write ...

What each instruction class buys

ClassExamplesWhat it forces hardware to have
Integer mathIADD, IMUL, IXOR, IRORA 64-bit ALU with mult/shift/rotate — every CPU has it; ASICs would need to add it.
Floating pointFADD, FSUB, FMUL, FDIV, FSQRTIEEE 754 double-precision with correct rounding modes. Killer for GPUs (which favor single-precision) and webassembly (no directed rounding).
Memory R/WISTORE, IADD_M, FADD_MA working cache hierarchy. Reads at L1/L2/L3 latency. Writes that update cached state.
BranchesCBRANCHA branch predictor. ~1% probability so misprediction is rare but not zero — exercises the prediction logic.
ReciprocalIMUL_RCPMultiplication by precomputed reciprocal (avoids slow integer divide). Loads a 64-bit literal into a register.

JIT compilation · the speed unlock

RandomX bytecode is not interpreted — it's JIT-compiled to native machine code for the host CPU on every hash. The reference implementation includes JIT compilers for x86-64, ARM64, and (recently) RISC-V. The compiled code runs at full speed; the interpreter path is the fallback for platforms without a JIT and is ~10× slower. This is why mining is so much faster than verification on platforms with the JIT — and why the algorithm needs a fairly recent CPU to be efficient.

06 · modes

§6Fast mode for mining, light mode for verifying

A subtle but elegant property of RandomX: verification doesn't need to be as expensive as mining. The 2 GB Dataset is only required for fast mode. In light mode, the verifier keeps only the 256 MB Cache and computes Dataset entries on-demand as the VM requests them. Same answer, much less memory — at the cost of being too slow to mine competitively.

ModeRAMUse case
Fast mode~2.08 GBMining. The entire Dataset is precomputed and held in RAM. Each hash reads ~16,384 entries directly. ~4-6× faster than light mode.
Light mode~256 MBVerification. Only the Cache is held. Dataset entries are regenerated on-demand by running 8 SuperscalarHash invocations from the Cache. Slow, but lets any node verify a block without 2 GB of RAM.

Why this split is brilliant

If fast mode required only 256 MB, embedded devices and old CPUs could mine — but ASICs would also have a much easier target. If light mode required 2 GB, every monerod verifying blocks would need 2 GB just for PoW — pricing out lightweight nodes. Splitting the cost keeps mining hard and verification accessible.

07 · the result

§7Five years in: is it working?

RandomX has been running on Monero mainnet since November 30, 2019. The empirical record:

No ASICs detected

Six years in, no confirmed Monero RandomX ASIC has ever appeared on the market. Compared to the cycle of attacks against CryptoNight every 6-9 months, this is a stark difference.

GPUs are uncompetitive

GPUs underperform CPUs at RandomX by ~3-5×. The combination of integer arithmetic, FP64 with directed rounding, and random branching is exactly the GPU's weak spot. NVIDIA and AMD essentially gave up trying.

Audited four times

Trail of Bits, Kudelski Security, Quarkslab, and the Monero Research Lab have all independently reviewed the algorithm. No critical findings.

The downsides

Mineable malware

Because efficient mining needs ≥2 GB of RAM, ironically RandomX is somewhat easier to detect as cryptojacking malware than smaller-memory algorithms. Tools like "RandomX Sniffer" use the 2 GB allocation as a tell.

Web mining impossible

Browser sandboxes don't support FP64 directed rounding, and 2 GB allocations are blocked. WebAssembly mining (the Coinhive era) is structurally dead — by design.

32-bit can't compete

RandomX requires 64-bit integer mults and 2+ GB virtual address space. Old hardware, embedded chips, and IoT devices are effectively excluded from mining — which the Monero community considers acceptable.