Voice synthesis on ISR

Page 34/36
27 | 28 | 29 | 30 | 31 | 32 | 33 | | 35 | 36

By GhostwriterP

Hero (656)

GhostwriterP's picture

10-12-2021, 18:11

Wow, this gives quite an quality improvement (with -o1 option), much cleaner samples indeed (and nicer wave forms in the visualizers). I will experiment a little further with other samples, eager to see what is possible!

By Grauw

Ascended (10560)

Grauw's picture

10-12-2021, 23:05

Great to hear! I’m looking forward to Awesome version 2! (though it might lose some “character” Wink)

Do you not get the same results with -o2? What type of samples is that with, do you notice a difference between those that show a clear fundamental (one primary peak and valley), and those that consist primarily of overtones (two or more peaks & valleys)?

If so, it could be worth thinking about selecting the phase of an overtone if the fundamental has a low power, or perhaps the pitch detection should more eagerly reject low power fundamentals (with additional benefit that you get higher wave resolution)…

@ARTRAG Thanks for the pointers! I had found PEFAC cited in the source code (am about to read it, I read YIN yesterday), I’ll add RAPT to my reading list as well! From what I see so far PEFAC works based on the spectrogram and uses the presence of overtones to its advantage, like I was curious about before, so definitely interesting. It does inspire me with a bit more confidence than the time-domain autocorrelation-based algorithms.

By Grauw

Ascended (10560)

Grauw's picture

10-12-2021, 23:58

Oh the PEFAC paper was great. It’s pretty short and the method is relatively simple and easy to follow, and it answers a lot of questions that came to mind. Like that RAPT is also a time-domain algorithm that performs worse than YIN so probably not super interesting, and also that applying temporal continuity weights to the selection gives a big accuracy improvement and suppresses octave errors (which RAPT does so still worth a read, can also be applied to PEFAC). The configuration of the spectrogram and how it’s interpolated to log scale is also clearly described.

By ARTRAG

Enlighted (6828)

ARTRAG's picture

11-12-2021, 10:16

In my limited experiments with voice-box I remember that PEFAC was going better than RAPT
Moreover its implementation was exposing the parameters I needed to change so for me the choice was easy

Are you working to a new implementation of the encoder?
My version needs matlab run time libraries and this is a very high barrier to its diffusion and use

If you release a version that doesn't need that cumbersome overhead, it could help to spread the use of this encoding method and could go in bundle with Trilotrackr and Realfun 3 which include the use of its samples

By Grauw

Ascended (10560)

Grauw's picture

11-12-2021, 15:08

ARTRAG wrote:

Are you working to a new implementation of the encoder?
My version needs matlab run time libraries and this is a very high barrier to its diffusion and use

If you release a version that doesn't need that cumbersome overhead, it could help to spread the use of this encoding method and could go in bundle with Trilotrackr and Realfun 3 which include the use of its samples

It’s more like exploring, to see how it works, learn, and if I can find ways to improve it. I’m not sure if there will be a final implementation.

I’m on macOS so I can’t run the .exe conveniently, and I don’t have Matlab either. So instead I’m building some experiments of my own in JavaScript. But I think your executable is quite accessible for most people even though they need to install the Matlab runtime.

Tbh Python or C++ would be a better language choice since they have good signal processing libraries (e.g. numpy/scipy and JUCE). But I do most of my build scripts in JS and there is some fun and learning to be had implementing it myself. A web hosted conversion tool in combination with a MAP article could be interesting, but I’m not sure if I will get that far.

By GhostwriterP

Hero (656)

GhostwriterP's picture

12-12-2021, 11:24

Grauw wrote:

Great to hear! I’m looking forward to Awesome version 2! (though it might lose some “character” Wink)

Here it is Awesome-update

By Grauw

Ascended (10560)

Grauw's picture

12-12-2021, 12:27

Very cool, noticeable improvement to sample clarity!

By wimpie3

Champion (393)

wimpie3's picture

12-12-2021, 12:51

Sound f*cking fantastic. WOW!

By ARTRAG

Enlighted (6828)

ARTRAG's picture

13-12-2021, 10:35

Much cleaner now! Great amazing work!
Yea! Are you ready ? ;-) You are applying effects to sampled speech too.
The "Awesome" sample is modulated in real time.

Can you also use samples as they were instruments ?

By gdx

Enlighted (5333)

gdx's picture

13-12-2021, 11:41

WOW!!! Is this version already available for trial?

Page 34/36
27 | 28 | 29 | 30 | 31 | 32 | 33 | | 35 | 36