OpenAIs API and Tardigrade

keleffew · July 13, 2020, 4:49am

OpenAI’s API product is very interesting: https://openai.com/blog/openai-api/

Word on the street is that they have a private version out for the super powerful GPT3 model.

Would be cool to think about what kind of apps might be possible by distributing the Output of something like image samples: https://openai.com/blog/image-gpt/ for use in a distributed app.

Any thoughts on where the intersection of machine learning output and distributed storage might play together?

Toyoo · July 13, 2020, 7:09am

I used to do machine learning for living. Large datasets are indeed often stored in the cloud, but network latency of anything farther than local network makes training impossibly slow—datasets are always first downloaded locally. So if anything, cloud only helps in distributing data to researchers in a way not unlike, let say, torrents. And these are competitive, see e.g. https://academictorrents.com/

Besides, despite the hype of large models, like GPT, trained on huge datasets, most machine learning is done on small (let say, below a gigabyte) or even tiny datasets (let say, below a megabyte).

In my former job I wouldn’t have a place to use Tardigrade for anything except maybe classical backups.