Basically, data have to be ‘serialised’, exposed, in a machine-readable format that is easy to parse by machines, possibly without the need for proprietary software. There are formats that are technically machine-readable (HTML, Excel), but they’re not necessarily easy to parse, either because they are not rigorously structured or because the algorithms to parse them are proprietary.