Main page

This is a website about the increasing urgency to ban covert modeling of the human voice. When it cannot be determined by human testing whether some fake voice is a synthetic fake of some person’s voice, or is it an actual recording made of that person’s actual real voice, it is a digital sound-alike.

Work is ongoing in the Stop Synthetic Filth! SSF! wiki, a successor of the BCM! wiki, to expose the digital sound-alikes (or sound-like-anyone-machines) and digital look-alikes (or look-like-anyone-machines).

Logo of the Stop synthetic filth! wiki
Visit Stop Synthetic Filth! wiki and thus help us in the struggle against synthetic human-like fakes!

Synopses of SSF! wiki in other languages

Lets find ways to stop, neutralize, muffle or bell the digital sound-alikes!

The below video ‘This AI Clones Your Voice After Listening for 5 Seconds’ by ‘2 minute papers’ describes the voice thieving machine presented by Google Research in w:NeurIPS 2018.

The content below is from Juho Kunsola‘s draft for law to ban covert modeling of the human voice

Law proposal to ban covert modeling of human voice

§1 Covert modeling of a human voice

Acquiring such a model of a human’s voice, that deceptively resembles some dead or living person’s voice model of human voice, possession, purchase, sale, yielding, import and export without the express consent of the target is punishable.

§2 Application of covert voice models

Producing and making available media from a covert voice model is punishable.

§3 Aggravated application of covert voice models

If produced media is used in for the purpose to frame a human target or targets for crimes or to defame the target, the crime should be judged as aggravated.


The content below is from Stop Synthetic Filth! wiki on digital sound-alikes.

Digital sound-alikes

Living people can defend[footnote 2] themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.

‘Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis’ 2018 by Google Research (external transclusion)

The Iframe below is transcluded from ‘Audio samples from “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”‘ at google.gituhub.io, the audio samples of a sound-like-anyone machine presented as at the 2018 w:NeurIPS conference by Google researchers.

Observe how good the “VCTK p240” system is at deceiving to think that it is a person that is doing the talking.

A picture of a cut-away titled "Voice-terrorist could mimic a leader" from a 2012 w:Helsingin Sanomat warning that the sound-like-anyone machines are approaching. Thank you to homie Prof. David Martin Howard of the w:University of York, UK and the anonymous editor for the heads-up.
A picture of a cut-away titled “Voice-terrorist could mimic a leader” from a 2012 w:Helsingin Sanomat warning that the sound-like-anyone machines are approaching. Thank you to homie Prof. David Martin Howard of the w:University of York, UK and the anonymous editor for the heads-up.
 
 

Example of a hypothetical 4-victim digital sound-alike attack

A very simple example of a digital sound-alike attack is as follows:

Someone puts a digital sound-alike to call somebody’s voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:

  1. Victim #1 – The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
  2. Victim #2 – The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
  3. Victim #3 – It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
  4. Victim #4 – Our judiciary which prosecutes and possibly convicts the innocent victim #1.

Thus it is high time to act and to criminalize the covert modeling of human voice!

Examples of speech synthesis software not quite able to fool a human yet

Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.

Reporting on the sound-like-anyone-machines

Documented digital sound-alike attacks

Limits of digital sound-alikes

w:Thomas Edison and his early w:phonograph. Cropped from w:Library of Congress copy, ca. 1877, (probably 18 April 1878)
w:Thomas Edison and his early w:phonograph. Cropped from w:Library of Congress copy, ca. 1877, (probably 18 April 1878)
 

The temporal limit of whom, dead or living, the digital sound-alikes can attack is defined by the w:history of sound recording. The article starts by mentioning that the invention of the w:phonograph by w:Thomas Edison in 1877 is considered the start of sound recording, though it does mention Scott’s phonautograph of 1857 in bold font.


 

A green graphic stating that ban-covert-modeling.org uses electricity from renewable sources only
Ban-covert-modeling.org is hosted in Finland and uses electricity from renewable sources only (check)

 

 

Enter the Ban Covert Modeling! wiki
Ban Covert Modeling! wiki
Image 1: Separating specular and diffuse reflected light

(a) Normal image in dot lighting

(b) Image of the diffuse reflection which is caught by placing a vertical polarizer in front of the light source and a horizontal in the front the camera

(c) Image of the highlight specular reflection which is caught by placing both polarizers vertically

(d) Subtraction of c from b, which yields the specular component

Images are scaled to seem to be the same luminosity.

Original image by Debevec et al. – Copyright ACM 2000 – http://dl.acm.org/citation.cfm?doid=311779.344855 – Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Image 2 (low resolution rip)
(1) Sculpting a morphable model to one single picture
(2) Produces 3D approximation
(4) Texture capture
(3) The 3D model is rendered back to the image with weight gain
(5) With weight loss
(6) Looking annoyed
(7) Forced to smile
Image 2 by Blanz and Vettel – Copyright ACM 1999 – http://dl.acm.org/citation.cfm?
doid=311535.311556
– Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.