MSX Programming Languages Showdown

Página 3/6
1 | 2 | | 4 | 5 | 6

Por ARTRAG

Enlighted (6846)

imagem de ARTRAG

17-09-2013, 07:56

ehehe, the sdcc compiler has inverted the loop, this is why :-)
With Hithech C you have to do it manually but once you do that you get 13 seconds on plain z80 (837 ticks)
;-)
This is the code:

unsigned int i;

void testCode(void) {
	register unsigned int j,s;
	s = 0;
	for (i=0;i<10000;i++)
	  for(j=100;j>0;j--)
	    s++;
	printf("%d\n",s);
}

correspondin to this:

    8   0000'                   _testCode:
    9   0000' FD E5             	push	iy
   10                           ;main.c: 8: register unsigned int j,s;
   11                           ; _s allocated to iy
   12   0002' FD 21 0000        	ld	iy,0
   13                           ;main.c: 10: for (i=0;i<10000;i++)
   14   0006' 21 0000           	ld	hl,0
   15   0009' 18 0E             	jp	L1
   16                           
   17   000B'                   l5:
   18                           ;main.c: 11: for(j=100;j>0;j--)
   19                           ; _j allocated to bc
   20   000B' 01 0064           	ld	bc,064h
   21   000E'                   l9:
   22                           ;main.c: 12: s++;
   23   000E' FD 23             	inc	iy
   24   0010' 0B                	dec	bc
   25   0011' 78                	ld	a,b
   26   0012' B1                	or	c
   27   0013' 20 F9             	jp	nz,l9
   28   0015' 2A 0000'          	ld	hl,(_i)
   29   0018' 23                	inc	hl
   30   0019'                   L1:
   31   0019' 22 0000'          	ld	(_i),hl
   32   001C' 01 2710           	ld	bc,02710h
   33   001F' B7                	or	a
   34   0020' ED 42             	sbc	hl,bc
   35   0022' 38 E7             	jp	c,l5
   36                           ;main.c: 13: printf("%d\n",s);
   37   0024' FD E5             	push	iy
   38   0026' 21 0000'          	ld	hl,u19
   39   0029' E5                	push	hl
   40   002A' CD 0000*          	call	_printf
   41   002D' C1                	pop	bc
   42   002E' C1                	pop	bc
   43                           ;main.c: 14: }
   44   002F' FD E1             	pop	iy
   45   0031' C9                	ret	
   

Por yzi

Champion (444)

imagem de yzi

17-09-2013, 08:30

Maybe the original test only tests one very narrow thing: can the compiler figure out that it's faster to count to 100 than to 10000, on an 8 bit processor, and that in this particular example it's possible to swap the order of the for loops without changing the output. Based on this test alone, I wouldn't make too many conclusions.

Por yzi

Champion (444)

imagem de yzi

17-09-2013, 09:07

I mean, if the compiler was really smart, it would leave out the for loops entirely and just print the result. The loop counter variables aren't needed for anything. And the test also assumes that the programmer is clueless.

Por ARTRAG

Enlighted (6846)

imagem de ARTRAG

17-09-2013, 09:43

Actually, the fact SDCC is able to revert the loop to gain speed relaying on the fact the counter variable is not used is a very good feature. It makes the asm closer to the one a human would do.
Afterall, SDCC is still in development, Hitech C stopped its development in 2001.

Another possible optimization for SDCC could be to declass the counter variable j to unsigned char: again, once it is not used in the output, you can represent it internally in a more efficient way.

@Marq
Are you involved in the SDCC development ?

[edit]
@ yzi, I agree, the smartest thing is to compute the result offline and print it ;-)
Anyway, I did a quick test on gcc for x86 cmpiling the sample code.
It seems to do the true loop, so I wouldn't ever dare to ask for this level of optimization for a z80 corosscompiler.

4  	void testCode(void) {
0x00401334	push   %ebp
0x00401335	mov    %esp,%ebp
0x00401337	push   %edi
0x00401338	push   %esi
0x00401339	push   %ebx
0x0040133A	sub    $0x1c,%esp
5  		register unsigned int i,j,s;
6  		s = 0;
0x0040133D	mov    $0x0,%ebx
7  		for (i=0;i<10000;i++)
0x00401342	mov    $0x0,%edi
0x00401347	jmp    0x401357 [testCode+35]
0x00401356	inc    %edi
0x00401357	cmp    $0x270f,%edi
0x0040135D	jbe    0x401349 [testCode+21]
8  		  for(j=100;j>0;j--)
0x00401349	mov    $0x64,%esi
0x0040134E	jmp    0x401352 [testCode+30]
0x00401351	dec    %esi
0x00401352	test   %esi,%esi
0x00401354	jne    0x401350 [testCode+28]
9  		    s++;
0x00401350	inc    %ebx
10 		printf("%d\n",s);
0x0040135F	mov    %ebx,0x4(%esp)
0x00401363	movl   $0x403024,(%esp)
0x0040136A	call   0x401bb0 [printf]
11 	}
0x0040136F	add    $0x1c,%esp
0x00401372	pop    %ebx
0x00401373	pop    %esi
0x00401374	pop    %edi
0x00401375	pop    %ebp
0x00401376	ret

Por MicroTech

Champion (385)

imagem de MicroTech

17-09-2013, 13:57

Another smart trick to improve performance (without appealing to assembly):

void testCode()
{
	unsigned int i,j,s;

	s = 0;
	for (i = 10000; i; i--)
		for(j = 100; j; j-- )
			s++;
	printf("%d\n",s);
}

868 vdp interrupts = 14,96 sec(s).

Or even better:

void testCode()
{
	unsigned int i,s;
	/*unsigned*/ char j;

	s = 0;
	for (i = 10000; i; i--)
		for(j = 100; j; j-- )
			s++;
	printf("%d\n",s);
}

762 vdp interrupts = 12,7 sec(s).
(I used char instead of its unsigned counterpart because not supported by MSX-C Crying )

Por Marq

Champion (387)

imagem de Marq

17-09-2013, 16:57

Quoting myself from a previous post:

"Well, I didn't mean this experiment as any conclusive test – just a fun one-afternoon test run of different compilers."

There are plenty of better test cases available all around the web, although many of them might rely on functionality (long ints, multiplications and so on) that aren't native to Z80. Tweaking the code to get better performance is kind of beside the point. For the sake of it, we could implement a bit more realistic open task or a set of tasks which would include at least:

  • Nested loops
  • Conditional jumps
  • Array indexing
  • Basic math
  • Function calls

Can't make it awfully complex if it's to work with BASIC, too, so need to leave out structs etc.

Por Marq

Champion (387)

imagem de Marq

17-09-2013, 17:01

By the way, SDCC doesn't seem to benefit at all from that char counter trick. Same 13 s with that.

Por ARTRAG

Enlighted (6846)

imagem de ARTRAG

17-09-2013, 19:41

If you publish a test suite I can run it on the Hitech cross compiler

PS
The unsigned char trick works very well for Hitech cross compiler. This code:


void testCode(void) {
	register unsigned char j;
	register unsigned int i,s;
	s = 0;
	for (i=10000;i;i--)
	  for(j=100;j;j--)
	    s++;
	printf("%d\n",s);
}

runs in 11 seconds (696 ticks) and corresponds to this asm:

    7   0000'                   _testCode:
    8                           ;main.c: 8: register unsigned char j;
    9                           ; _s allocated to de
   10   0000' 11 0000           	ld	de,0
   11                           ;main.c: 11: for (i=10000;i;i--)
   12                           ; _i allocated to bc
   13   0003' 01 2710           	ld	bc,02710h
   14   0006' 18 0B             	jp	l8
   15   0008'                   l5:
   16                           ;main.c: 12: for(j=100;j;j--)
   17                           ; _j allocated to l
   18   0008' 2E 64             	ld	l,064h
   19   000A' 18 02             	jp	l12
   20   000C'                   l9:
   21                           ;main.c: 13: s++;
   22   000C' 13                	inc	de
   23   000D' 2D                	dec	l
   24   000E'                   l12:
   25   000E' 7D                	ld	a,l
   26   000F' B7                	or	a
   27   0010' 20 FA             	jp	nz,l9
   28   0012' 0B                	dec	bc
   29   0013'                   l8:
   30   0013' 78                	ld	a,b
   31   0014' B1                	or	c
   32   0015' 20 F1             	jp	nz,l5
   33                           ;main.c: 14: printf("%d\n",s);
   34   0017' D5                	push	de
   35   0018' 21 0000'          	ld	hl,u19
   36   001B' E5                	push	hl
   37   001C' CD 0000*          	call	_printf
   38   001F' C1                	pop	bc
   39   0020' C1                	pop	bc
   40                           ;main.c: 15: }
   41   0021' C9                	ret	

Por PingPong

Prophet (3898)

imagem de PingPong

17-09-2013, 23:16

@ARTRAG: does the Hitech C support the volatile keyword?

Por ARTRAG

Enlighted (6846)

imagem de ARTRAG

18-09-2013, 00:17

yes, it forces the compiler to not reuse the value of variable in registers but to retrive it from ram

PS
a large collection of C compilers for z80
http://www.z80.eu/c-compiler.html

Página 3/6
1 | 2 | | 4 | 5 | 6