Chapter 4

Data Provenance and Markets

Modern AI is built on scraped web data, licensed corpora, and user-generated content — often with murky attribution. Artists, writers, and developers increasingly ask who profited from their work and whether consent was obtained. Blockchains offer timestamped records and programmable royalty splits, though they cannot magically prove what happened off chain.

Data markets imagine a world where you license a narrow slice of your inbox, browsing history, or creative portfolio for model training — paid in stablecoins or tokens. Privacy-preserving techniques like federated learning and local differential privacy reduce raw data exposure but complicate verification.

For practitioners, provenance is a product decision: disclose training sources, offer opt-out, and use on-chain registries where transparency is a competitive advantage — not because the chain solves every IP dispute.