Training an AI to Generate Sprites
I am creating a game, and while I can create pretty good 16x16 or 32x32 sprites, I do not have enough lifetimes to generate what I need, nor the money to pay an army to do so. Big game companies can do this, but not me.
I am also a computer scientist, and am very familiar with AI art generation and neural networks.
What I would like to do is to see if it is possible to train an stable diffusion (a specific AI image generation neural network) to convert a provided artwork into appropriate sprites.
I have considered how I would do this using a traditional computer program... one cannot simply pixelate the image, because getting the "look and feel" depends on interpreting the image and necessarily distorting the size, location or colors or contrast of some elements (such as eyes and a mouth) that would otherwise be simply lost/blurred in straight forward pixelation, while also losing some details that are extraneous to the sprite (such as the scales on a dragon, or the many highlights on a knight's suit of armor).
The bottom line is that to do this one needs "training data"; pairs of images that closely align, such as "the photo of a loaf of bread" or "artwork of a loaf of bread", paired with a "sprite image of the same loaf of bread". That part is very important, the combination of the original and the closely matching sprite, so that the AI can learn what to do when given an original, what to keep and what to throw away and what to reinterpret.
I am not certain this can easily be done. There is a simple training mechanism built into modern AI image generation software known as a LORA which might be able to do this, or it might involve using an open source GPT software and trainging it more throughly. I won't know until I try, but I have repeatedly found that modern AI implementations surprise me (by sometimes succeeding and other times failing in surprising ways).
If anyone would like to partner in this effort, I'd like to at least open discussions about what is possible. For example, does anyone have a collection of sprites and original artwork used to produce them (versus the way I do it, which is to make it up entirely in my head from scratch), or can anyone easily generate a set of training data by taking their existing sprites and retroactively generating more detailed images that would serve as a proxy original-source for the sprite(s)?
As a side note, I know that many people are frightened of or offended by AI art ("it's not real art"). I don't want to get into that discussion. If you don't like this idea, please just move on. But the point of ALL AI at the moment is to enable people who have imagination, or have some personal abilities, or both, to do more than they otherwise could. That is not a bad thing, any more than developing prosthetics so that amputees can walk and run is a "bad thing". I can both imagine and program entertaining games, and write entertaining scripts and scenarios, and develop working game mechanics, and even generate my own art, but I just don't have the time to do it all.
And the point is not to eliminate the artist from the equation, or make "open game art" obsolete. The point is to empower actual artists to be more prolific and empowered than they already are.
Lastly, note that AIs are notoriously bad at taking direction, which is a huge limitation in using them to do things like generate game art. If you tell it to draw a dragon smoking a cigarette, you might well wind up with a dragon made of smoke, or a cigarette with a dragon drawn on the wrapper, or a biker with a dragon tattoo smoking a cigarette. [And try telling an AI to draw a mouse wearing glasses, then tell it to redraw it without glasses... an AI will almost invariable then redraw the image with MORE glasses in the image).
Hi, good idea, I think it is absolutely possible to achieve what you intend.
I will give you some advice based on my own experience and hopefully you find anything useful out of it:
Training a network to achieve your goals is an horrendous ammount of work and energy, probably a lot more work than drawing the whole assets yourself.
A.I. is still unable to follow direction and to achieve coherence, the only way I know to achieve coherence with A.I. art is by adding dual conditioning: prompts + base images. Another thing that helps if the image is too complex (like the dragon smoking you said in your thread), is to divide the images in meaningfull objects (you can generate the dragon and ciggarrete or smoke in different steps and if you achieve the right perspective composite them manually).
There are enough trained network for most art styles so it is a lot more practical to find and use the closest one as to train one yourself unless you really need very specific outputs.
In my experience conditioning the A.I. with such resolution (16x16 or 32x32) results most of the time in garbage. I would suggest to create high resolution base images and then find the right way to reduce the size.
So in resume, the only way to spare time instead of spending more time (and energy) in achieving the results you want is to find the closest weight that fits your needs, to use dual conditioning (prompts + base images) and to use high resolution and high quality base images for the conditioning.
I hope any of this feedback helps you and wish you good luck with your project and that you achieve great results.
For me it would be already a giant leap to associate a generic AI to Spritesheet Generator, so you can type/talk rather than clicking and clicking and clicking and scrolling and selecting and clicking and...
AI is also already pretty good at generating pixelart faces, so the feature of adding a specific face to a spritesheet with a few words would be great: generate profile face, mirror it, generate front, generate back, put 4 frames into an image, apply the mini-spritesheet to main spritesheet, et voila.
yes, that spritesheet generator is an awesome tool and if in the good hands, combining it with an A.I. you can get great personalized results for your assets and you spare the time of creating the base images for the dual conditioning. Good advice to combine both worlds! I don't think we would see a software that combine generative art with that sprite sheet generator any time soon, but using the best of those tools and combining them manually should not be too hard I think.
Hi @Lord Bob,
I have some experience in the past regarding this, The best result I found thus far is to let AI generate High-res image and then use that in image-to-image with some Pixel art loras to generate a Pixel art. This results is sometimes of desired style but is not accurate pixel-wise. However, it is less tedius to just clean up an image rather than to make something from scratch. There are also addons for Automattic1111 which have fixed pallet and resolution binders which may help.
RKP