This is a website about the increasing urgency to ban covert modeling of the human voice. When it cannot be determined by human testing whether some fake voice is a synthetic fake of some person’s voice, or is it an actual recording made of that person’s actual real voice, it is a digital sound-alike.
Work is ongoing in the Stop Synthetic Filth! SSF! wiki, a successor of the BCM! wiki, to expose the digital sound-alikes (or sound-like-anyone-machines) and digital look-alikes (or look-like-anyone-machines).
Synopses of SSF! wiki in other languages
- Stop Synthetic Filth! wordpress in English SSF!
- Arrêtons les saletés synthétiques! accueil en français ASS!
- Stoppi synteettiselle saastalle! kotisivu suomeksi SSS!
- Stoppa syntetisk orenhet! hemsida på svenska SSO!
- Stopp sünteetisele saastale! koduleht eesti keeles SSS!
Lets find ways to stop, neutralize, muffle or bell the digital sound-alikes!
The below video ‘This AI Clones Your Voice After Listening for 5 Seconds’ by ‘2 minute papers’ describes the voice thieving machine presented by Google Research in w:NeurIPS 2018.
The content below is from Juho Kunsola‘s draft for law to ban covert modeling of the human voice
Law proposal to ban covert modeling of human voice
§1 Covert modeling of a human voice
Acquiring such a model of a human’s voice, that deceptively resembles some dead or living person’s voice model of human voice, possession, purchase, sale, yielding, import and export without the express consent of the target is punishable.
§2 Application of covert voice models
Producing and making available media from a covert voice model is punishable.
§3 Aggravated application of covert voice models
If produced media is used in for the purpose to frame a human target or targets for crimes or to defame the target, the crime should be judged as aggravated.
The content below is from Stop Synthetic Filth! wiki on digital sound-alikes.
Living people can defend[footnote 2] themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.
‘Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis’ 2018 by Google Research (external transclusion)
- In the 2018 at the w:Conference on Neural Information Processing Systems (NeurIPS) the work ‘Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis’ (at arXiv.org) was presented. The pre-trained model is able to steal voices from a sample of only 5 seconds with almost convincing results
The Iframe below is transcluded from ‘Audio samples from “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”‘ at google.gituhub.io, the audio samples of a sound-like-anyone machine presented as at the 2018 w:NeurIPS conference by Google researchers.
Observe how good the “VCTK p240” system is at deceiving to think that it is a person that is doing the talking.
Example of a hypothetical 4-victim digital sound-alike attack
A very simple example of a digital sound-alike attack is as follows:
Someone puts a digital sound-alike to call somebody’s voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:
- Victim #1 – The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
- Victim #2 – The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
- Victim #3 – It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
- Victim #4 – Our judiciary which prosecutes and possibly convicts the innocent victim #1.
Thus it is high time to act and to criminalize the covert modeling of human voice!
Examples of speech synthesis software not quite able to fool a human yet
Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.
- Lyrebird.ai (listen)
- CandyVoice.com (test with your choice of text)
- Merlin, a w:neural network based speech synthesis system by the Centre for Speech Technology Research at the w:University of Edinburgh
- ‘Neural Voice Cloning with a Few Samples at papers.nips.cc, w:Baidu Research‘es shot at sound-like-anyone-machine did not convince in 2018
Reporting on the sound-like-anyone-machines
- “Artificial Intelligence Can Now Copy Your Voice: What Does That Mean For Humans?” May 2019 reporting at forbes.com on w:Baidu Research‘es attempt at the sound-like-anyone-machine demonstrated at the 2018 w:NeurIPS conference.
Documented digital sound-alike attacks
- Sound like anyone technology found its way to the hands of criminals as in 2019 Symantec researchers knew of 3 cases where technology has been used for w:crime
- “Fake voices ‘help cyber-crooks steal cash’” at bbc.com July 2019 reporting 
- “An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft” at washingtonpost.com documents a w:fraud committed with digital sound-like-anyone-machine, July 2019 reporting.
Limits of digital sound-alikes
The temporal limit of whom, dead or living, the digital sound-alikes can attack is defined by the w:history of sound recording. The article starts by mentioning that the invention of the w:phonograph by w:Thomas Edison in 1877 is considered the start of sound recording, though it does mention Scott’s phonautograph of 1857 in bold font.