I wonder if anyone is aware of this dataset
Sunday, August 24, 2025 - 23:05
I found this by chance and wondered if anyone is aware of it:
https://huggingface.co/datasets/nyuuzyou/OpenGameArt-OGA-BY-4.0
actually there are dataset for each license: https://huggingface.co/collections/nyuuzyou/opengameart-680b5aa2e8ab3ed893af03f0
Ugh point huggingface away from pixelart. It is sacred! And Really? We need a dataset of 32x32 bunny spritesheet? Computers cant make this on their own by now?
I was just sharing the info, mainly because some people are interested in backing up the site—and at the very least, this serves as a backup of the assets. That’s my main goal.
As for the purpose of the dataset: it’s not just about pixel art. It includes various graphic styles, music, and more. And honestly, there’s not much anyone can do about it—licenses allow it. Still, if someone finds it useful as a backup, then it’s there for them.
For anyone feeling concerned: There's an important distinction between having a dataset available for training and an AI actually being capable of understanding and using that data well enough to replace artists. Right now, AI is still far from reaching that level—especially when it comes to specialized artistic disciplines.
I doubt that a model trained solely on OGA would be very competitive, considering the vast and diverse datasets used to train most modern models. That said, this is purely speculation—I haven’t tested such a model or seen any artwork generated from it. However, OGA assets could potentially be included in larger datasets depending on their license. Based on my research, assets under CC0 and CC-BY licenses are eligible for inclusion, as they permit reuse—CC0 allows unrestricted use, while CC-BY requires attribution, both of which align with common dataset inclusion standards.
I have an idea for a good use of this dataset. The attribution data is stored in JSON and Excel files, with entries that look like this:
To make the dataset more useful—particularly for identifying tracks that work well together in mixes—I’d like to add two extra fields to each track: "key" and "bpm".
I know how to extract this information myself, but with nearly 1,900 tracks under just one license type, doing it manually would take far too long.
If anyone’s interested in helping process and tag these tracks, we could create a really valuable tool for finding compatible tracks and building better mixes from this library of OGA music.
Let me know if you'd like to get involved!
Could you explain what it is ?
(I'm not able to figure out.)
Hi Blue_Prawn,
What would you like to know—more about the dataset or the JSON files? To summarize, the dataset includes all OGA assets and was last updated four months ago. The JSON files are used for attribution and tagging, initially designed for AI training, but they can also serve other purposes, like organizing and filtering tracks based on the tags they contain.
While the JSON files themselves aren't exactly user-friendly at first glance, I've built a simple browser to display the data in a more accessible way.
As the saying goes, "a picture is worth a thousand words," and if it's animated, even better! So, I've attached an example of what the browser looks like in action.
You can also add new tags—I'm considering adding tags for BPM and key, which would be especially useful for musicians and DJs looking to find tracks that match specific criteria.
I hope this makes things clearer!