cloned voices as open source assets
Tuesday, July 25, 2023 - 05:54
Would it be possible to share cloned voices as assets in this site or is there any incompatibility with the rules I am not aware of?
just wondering because one of the things that are currently possible and hard to find is exactly that, voices to use on games, and since finding voice actors without a budget is quite difficult, I was wondering if it would be a good idea to render them with machine learning to use on open source projects...
Well, I think it depends on what program you used. I believe OGA would only allow AI generated art/music/voices if both the following are true:
1. The company permits commercial use.
2. If the AI was trained on a public domain dataset.
There are some out there, e.g. Replica Studios(an AI voice generation app) uses a dataset licensed CC-BY 4.0, and I believe you can upload your own voice samples to it. They do have a free trial if you want to try it out.
___________________________________________________________________
No mind to think;
No will to break;
No voice to cry suffering.
without the consent of the recording artist (the voice that is being cloned) and without the recorded work (the actual audio files being used to clone the voice) being open content, i would say this falls under the same peril as other AI art. using a dataset that consists of copyrighted material is stealing. thats my opinion, courts haven't really decided yet how this will be treated.
for now, i would think that submitting voice clones made using a non-open-content dataset would be cause for a "files temporarily unavailable" flag on the asset.
with that said, creating an open content set of voice clones would be a great idea. it would be difficult but not impossible to find voiceover recording artists to sign off on their voices being cloned, because it would be a one-time-pay-me-now-then-i-am-obsolete type of deal.
do you know much about voice cloning ai, and how to train datasets? if so, i would encourage you to start a project looking for volunteer voiceover actors to record audio for the dataset. i imagine with some work you could create something that would be a worthwhile asset to indie devs and open source projects.
i tried doing some voice cloning on some of the cc0 voiceover assets submitted by Kenney, but with very poor results. you need quite a lot of recorded audio to get a good dataset.
i dont have a good microphone at the moment, but i could in the future get one. if we can come up with a standard script that would be good for training a dataset,. for voice actors to read that script, this could become a thing. i will do some research on voice cloning datasets, and see if we can come up with a script. then all we would need would be voice actors willing to participate under the terms that their voiceovers would be part of an open content dataset. i would be interested in participating in this project, but i would be an advocate for CC0 licensing and not -BY or -SA. the fewer restrictions there are on content the better it is for the world, in my opinion.
I've never did machine learning training and just used trained so called weights to test the technology, but I think I could do that. The problem with the A.I. being trained on non open datasets I think should be discarded if you only use a voice from someone who permits his / her voice to be cloned, correct me if I'm wrong. The challenges to achieve a collection of good quality cloned voices as "ragnar Random" mentioned above would be to have a large and high quality collections of audios and another challenge would be to find people willing to let their voice be "open sourced".
There are other issues that come to my mind such as someone making a bad use of those cloned voice even if does not break the license, but nevertheless I think there is a lot of potential in a set of cloned voices for creative projects and it could eventually enhance the quality of them if well used...
yeah the issue is the dataset for the most part
the algorithm that uses the dataset has a license of course, but that algorithm is software. if you draw a picture in Photoshop, the art produced does not inherit the Photoshop license.
if the algorithm is run on an online webhost, then the person hosting the algorithm can have their own Terms of Use that could conflict with licensing the generated works as open content even if you train a dataset using open content on their platform.
would be nice if we could get someone in the OGA community with good knowledge of python to team up, most of the ai algorithms i am familiar with use python....
Ragnar Random: scripts make life easier, specially for such technology that most of the time does not even have a gui, so if you create one that works please let me know, I'm very interested.
> using a dataset that consists of copyrighted material is stealing. thats my opinion, courts haven't really decided yet how this will be treated.
Well, when using databases of copyrighted material benefited Google and other big search engines who used it for commercial purposes - it was not stealing but fair use. Now as the threat arises that it may benefit individuals courts will decide that it's stealing and will impose limitations only large corporations will be able to comply with (because suddenly Microsoft is leading the parade of C2PA ;) and mass hysteria in social media may just be proper marketing effort).
IMHO if a model is trained on whatever material is ok for as long as the output doesn't contain anything from that materials in a recognizable form. The problem with current AI datasets is that when you ask for "cute little monster" (we've talked about that in a different topic :D) you can easily get something that looks close enough to some pokemon which will get you into trouble with Nintendo - and as such it's better to avoid using such datasets for this specific reason. My idea of using AI generated pictures as portraits in JRPG quickly cooled down when I got Nick from Zootopia as one of the results :D
But let's not dwell on that :) Too much has already been said and nobody was asking any of us.
> voice actors willing to participate under the terms
If you do get to some good result with that, ping me. I have a more or less good mic and some experience in voice acting. Not sure if I'll manage to pull out enough dedication to finish the job (I know I'm quick to promise but often fail in the end, so no promises), but let's try.
Plus if it works I can do a dozen of different voices.
> come up with a script
I'm afraid there can be some problems with actual in-game voice. Reading a monotonous text is one thing. Having computer game character emotionally react to events in-game is a different thing. I might suggest to check experience of https://github.com/DanRuta/xVA-Synth - they've created an in-app editor for tempo and pitch (allowing to control emotion). Not sure how easy is to use this generator though.
They also have a dataset at least from Bethesda's games. So, using a dataset from some of the opensource games (like FreeDroid RPG, Valyria Tear or Dink Smallwood) can be a good (tested with time) option.
I didn't study the issue too deep, but XVASynth license looks like GPLv3, so we can even use this tool, just to generate a copyright-clean database.
EDIT: I've just realized that taining on LibriVox recordings may be a good start. It won't give any expressions, but the public domain recordings are already there together with public domain texts of the works. Just needs someone to listen and prepare for processing. This way the model can already have literally dozens to hundreds of voice actors ready. Still voiced reading (or even real-time synthesis) of text in-game may be beneficial as game asset.
eugeneloza, in order to get good results, first I need a good collection of audio to train the A.I.
One thing I noticed is that you can make a voice speak any language, but the cloned voice make the same mistakes a non native speaker who doesn't master the language would do regarding pronunciation: bad pronunciation of vowels and consonants due to missing examples in the given language. That's why it would be important in my opinion to have voices in several languages.
I've never trained an A.I. to clone voices but it would be an interesting (an eventually useful) experiment.
I already tested and experimented with cloned voices and they sound quite good but I'm not sure how much training and data was needed to create them.
I have another question that is related to this topic so I decided not to open a new one, what about cloned voices for singing / adding vocals if they are free? I notice many voices from https://vsinger.com/ and https://vocadb.net/ which are used in known software such as vocaloid and other similar apps to sing melodies along the tracks have a free license. Does anyone knows what that free license means? is it possible to release tracks with them and license them as public domain?
Here there is another software which use A.I. and make use of some those cloned vocals and info about the license: https://support.acestudio.ai/article/23-introduction-of-ace-studio-licen...
As it can be seen on that site many virtual singers are offered as "license free". I would be interested to know what's the legal situation when releasing tracks using those vocals, could they be public domain too?
However! The license granted by Ace Studio is not the same thing as a license (if any) granted by the owners of the training data. As is common in AI these days, AI trainers will scrape publicly available assets without obtaining permission for their use. "Publicly available" is not the same as "Public Domain". i.e. images from Google Image Search are publicly available, but 90% are copyrighted and non-free. As eugeneloza mentioned, this may be considered Fair-Use.... Buuuuut 1) Fair-Use is not Public Domain and it comes with caveats on how it can be used, and 2) This Fair-Use defense is an assumption being generally made by AI trainers. Everyone is just assuming the courts will conclude its ok to not ask permission from the owners of the training data. OGA can make no such assumptions.
These are assessments from the perspective of OGA policy and do not neccessarily mean individual users would be unable to legally use such assets in their projects. What OGA is allowed to do is not the same as what you are allowed to do. That being said, until we have more details on those licenses and training dataset origins, the answer to this question:
... is "no", unfortunately.
--Medicine Storm
Thanks for the clarification medicine storm, I agree with you, if the license is not publicly available and you have to privately request it and the sites do not states PD or CC0 on their tools and cloned voices they should not qualify for OGA.
I would not publish any work here or anywhere using them either without a clear publicly available license that clearly states the uses I can do with the software and voices.
I'm simply curious and interested on the potential of those technologies and since I have not much idea of what those self advertised "free license" statements means, I wanted to have more feedback which you kindly provided.
I still believe voices could be trained and provided in a way that is completely compatible with foss software and public domain licenses and that the results of such new technologies can make a great impact for the better on those creations, but definitely those from which I asked are not created and provided in such way.
Even if such trained data and tools could be made completely compatible with CC0 licenses, it's completely ok not to want A.I. generated content in a site, I was genuinely interested on feedback regarding those questions and I'm grateful for your response.
here's a text-to-speech engine with a bunch of voices under various cc licenses, some foss-compatible: https://github.com/rhasspy/piper
(though most of them were trained by starting from the model for "lessac", which has a restrictive research license; idk how much that matters though)
and here's an asset pack made with it: https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to-speech
hei thanks for sharing hecko, I find it very interesting. the voices on the demo sounds to me almost real if it not were fore some slight robotic sounds and the intonation also needs some improvement but I find it totally usable and some of those problems could be at least masked with some post processing effects.
I'm thrilled by how those technologies improved and how they can be used to improve creations, despite all the dangers and potential misuses they will also carry.