Large bad data. Obviously not good and some have recognized this as “model collapse” – the bad data causes more bad data to be generated. Small bad data – nothing needs to be said here. Small (vetted, provenance-known) data – perhaps within your enterprise walls – perhaps the way to go, until large, good data that is used for training appears. When will this happen? I do not think this is on the horizon.