Voice synthesis on ISR

صفحة 31/36
24 | 25 | 26 | 27 | 28 | 29 | 30 | | 32 | 33 | 34 | 35 | 36

بواسطة Grauw

Ascended (10604)

صورة Grauw

07-03-2021, 14:26

I see, interesting…

I did some experiment to see exactly what could be going on; I listened to sccLOFI_1c-3.rom in openMSX with setting set speed 10, which does not (currently) affect the sound chip clock speed so it is convenient to listen to exactly what’s going on. Additionally, I looked at the waveform with toggle_scc_viewer (modified to update every frame).

Effect 6 has some warble in it, so I tried with that. When the warble occurs I hear a lot more overtones, and in the scc viewer I see the waveform change to a 2nd order wave…

Video of the experiment here.

بواسطة ARTRAG

Enlighted (6845)

صورة ARTRAG

07-03-2021, 18:43

I see what you mean and I think that the frames you have spotted are "unvoiced" segments that the algorithm for pitch estimation has failed to catch. In those regions, usually noisy, I use the highest frequency in the spectrum to approximate the sound. I know it is a very bad strategy, but I wasn't able to think anything better with only 32 samples.
BTW, the noise you refer to is much more continuous, and I think it is not related to that single frames

بواسطة Grauw

Ascended (10604)

صورة Grauw

07-03-2021, 19:48

ARTRAG wrote:

BTW, the noise you refer to is much more continuous, and I think it is not related to that single frames

Hmm yes you’re right. But if it is due to discontinuities between frames, I would expect the noise to be pitched 60 Hz, and this is much higher… Also the waveforms shouldn’t change significantly on a frame-by-frame basis, so in principle just a change in tonal character shouldn’t cause a big discontinuity.

In Synthesix I had this issue that the SCC resets the waveform phase when the frequency is set, I avoided it by only setting the frequency when it changes. During pitch bends it wasn’t too noticeable. An easy test would be to check if the warble disappears with a fixed frequency…

Some of it could also be coming from the source material. If I listen to this clip on Youtube where the vocals have been isolated, I do hear some odd harmonics, more subtle, but maybe they are amplified by the algorithm.

Oh, but I just checked WYZ’s True Survivor ROM posted on page 30 with openMSX, and it’s using the 5-channel technique. That was probably processed differently. The samples of your sccLOFI_1c-3.rom with the newer 1-channel conversion are sounding pretty clean overall.

بواسطة ARTRAG

Enlighted (6845)

صورة ARTRAG

23-03-2021, 18:51

Here there is a new version of the standalone encoder
https://github.com/cornelisser/TriloTracker/issues/146

I've added more parameters aimed to encode instruments and non vocal sounds.
On the command line you can use:
tnn
where nn is a two digit integer in 0-99 that allows you to change the threshold used to switch the processing between voiced segments to unvoiced segments. It has the meaning of a probability in 0,00 - 0,99
nn=00 means that all the sample is processed as voiced using the estimated pitch
nn=99 means that almost all the sample is processed as unvoiced using the frequency peak as base frequency
By default the threshold now is 0 (earlier it was 0,05)

You can also use:
gmmmmm
Where mmmmm is a 5 digits number representing the SCC period.
Pay attention to the number of digits: they have to be 5.
For example with mmmmm=01696 you get note C2.
This parameter is used to force the pitch used in the waveform approximation to a known values.
It is useful for sampling instruments where the note played by the sample is known.
By default the pitch is estimated by the frame.

Eg. try -p60t05g01696 to get:

NTSC frames (p60)
unvoiced processing if probability < 0.05 (t05)
Pitch forced at 3579545/(period+1)/32 Hz (where period is the SCC period = 1696 ) (g01696)

Note: the parameters go without spaces. The parser is very limited but it is case insensitive (i.e. P60 and p60 are the same).

بواسطة ARTRAG

Enlighted (6845)

صورة ARTRAG

06-12-2021, 18:04

The "noise" in the playback could be caused by the discontinuity between two successive frames. For each frame the encoder generates a wave of 32 samples and a period. The two successive waves usually present a discontinuity that result in the noise and there is no easy way to predict which sample in the first wave will be in execution when the first sample of the second wave will be executed.
If one were able to predict the value and the position of the sample being played when the new wave is loaded in the SCC ram one could change the phase of the new wave (rotating its 32 samples) in order to match the volume of the current sample with the first sample of the new wave and avoid volume jumps.

It seems all damn difficult to do (very timing critical, it would depend also on the way the ASM player is coded) and would reduce the noise caused by the discontinuity, not the noise due to the change in slope.

بواسطة Grauw

Ascended (10604)

صورة Grauw

06-12-2021, 19:20

ARTRAG wrote:

In the meantime I think I've understood that the "noise" in the playback is caused by the discontinuity between two successive frames. As you know, for each frame the encoder generates a wave of 32 samples and a period. The two successive waves usually present a discontinuity that result in the noise we hear and there is no easy way to predict which sample in the first wave will be in execution when the first sample of the second wave will be executed.

I think if you got the DFT of the waveform, you can zero out all the phases and it will still sound the same but the waveform will have a more continuous progression between frames. E.g. these are both (and sound like) square waves, but the bottom one has all-zero phases:

Although it can cause the waveform to peak out of range, so it needs to be attenuated to prevent that.

Now that I think of it, rather than zeroing out the phases probably better to just shift the fundamental phase to zero (or ½π) and then shift all the harmonic frequencies by the same amount times their index. Then you’ll preserve the original waveforms, just rotated so that they always align with the fundamental as reference point.

بواسطة ARTRAG

Enlighted (6845)

صورة ARTRAG

06-12-2021, 18:26

Sorry I cannot get why zeroing the fundamental phase of two waves will make them sound continuous ...
What do you mean by fundamental phase of a wave (a set of 32 samples) ?

The problem to solve is when the wave is updated by new data the SCC is playing and unknown sample from the previous frame. If I had know the value and the position of the "current" sample, I could have rotated cyclically the new wave in order to match the current value of the sample in the same position of the wave with the new data.

This would guarantee a continuous signal (provided that I can match the current volume with any of the samples in the new wave).

بواسطة Grauw

Ascended (10604)

صورة Grauw

06-12-2021, 19:15

ARTRAG wrote:

Sorry I cannot get why zeroing the fundamental phase of two waves will make them sound continuous ...
What do you mean by fundamental phase of a wave (a set of 32 samples) ?

I’m talking about aligning two waves, e.g. if you have ____/‾‾‾‾\ and __/‾‾‾‾\__ you can rotate them so that they’re both /‾‾‾‾\____. Without rotating them, if you switch between those waves there will be an audible disparity.

And so then the question becomes by how much to shift the waveforms so that their phase is normalised, how do you relate two waveforms and what do you use as reference, and my answer to that is: the phase of the fundamental. When you take the FFT then the complex number at index 1 will represent its amplitude and phase.

ARTRAG wrote:

The problem to solve is when the wave is updated by new data the SCC is playing and unknown sample from the previous frame.

If the wave is similar to the previous one there will not be audible disparities (or only very tiny ones). In normal progression of sound each frame’s waveform will be quite similar to the one that came before it, so you get a quite smooth transition.

بواسطة Grauw

Ascended (10604)

صورة Grauw

06-12-2021, 19:25

Grauw wrote:

Of course writing to the frequency register messes it all up because it resets the SCC’s phase counter. To compensate for that you need a whole different approach, and as you say it is difficult to predict so I’m not sure it’s a viable strategy unless the ISR timing is really precise.

Ah, correction: I double-checked myself with the openMSX source code, and although writing the frequency resets the period counter, it does not reset the waveform phase (unless you set test register bit 5). So there’s no problem at all, aside from the usual period reset glitch. Aligning all waves on the same fundamental phase should do the trick.

I edited my post above to make a bit more sense with this in mind.

بواسطة ARTRAG

Enlighted (6845)

صورة ARTRAG

06-12-2021, 19:45

When I say to cyclically rotate a sample, it means to shift the samples in circle where the samples that exit form one side are taken to the other side.
In the DFT domain it is equivalent to multiply by a linear phase the spectrum, power spectrum does not change.

Unfortunately the problem is harder than this.
Let me try to explain differently.

Assume you have two successive waves:
w0 = ____/‾‾‾‾\ and w1 = _/‾\__/‾\_
to be played during frame 0 and frame 1 respectively.
Each frame lasts 1/50 sec.

At the end of the frame 0,you start loading w1 in the SCC ram buffer.
You do not know witch sample in w0 is being executed while you overwrite its value.
If the volume of the current sample (the one being played) in w0 is close to the volume of the sample in w1 that follows its position it the sound will not get a discontinuity.
If * is the position of the "current" sample
w0 = __*_/‾‾‾‾\
I can replace w1 = _/‾\__/‾\_ by
w1 = \_*/‾\__/‾
so that the values in the position of the current ample are equal (or close), this avoiding discontinuities

So the problem is which is the "current sample", i.e. the one being playing when the w1 is written ?

At first glance, if you assume that the sample phase resets each time you write the frequency, it depends on the ratio between the period of w0 (divided by 32) and the frame duration.
It quite a mess to implement, and actually also the delay introduced by the player counts...

صفحة 31/36
24 | 25 | 26 | 27 | 28 | 29 | 30 | | 32 | 33 | 34 | 35 | 36