About C / Z80 optimizations (SDCC)

Pagina 17/17
10 | 11 | 12 | 13 | 14 | 15 | 16 |

Van zPasi

Champion (471)

afbeelding van zPasi

24-09-2019, 13:06

PingPong wrote:

Look at the code above in previous page. It's not enough?

There is no code in the previous page. To which page are you referring to?

Anyway, I'm through reading assembly code. I want to see how much, if any, difference there is in practice. In a real program that can be built with both of compilers.

For example, register parameters boost a lot - in theory. In practice, if you have more than two 16 bit parameters, or if there is one automatic variable in the callee function, the boost is cancelled immediately. There will be that familiar push ix, ld ix,0, add ix,sp etc with all the overhead it generates.

There are some weak points in SDCC, but when I know them and I'm able to avoid them, generally SDCC performs as good as Hi-Tech. Prove me wrong or just shut up.

Quote:

PS= If you don't care about Hitech-C or SDCC why you shut up at every my criticism of SDCC like a human sustaining a surgery without anestetics?

I don't care. Your so called criticism just is not appropriate. There is no substance in it. Why are you doing this?

This thread is about writing efficient C-code for Z80, not comparing compilers.

Quote:

I can feel your pain. ;-)

Just try to behave.

Van PingPong

Prophet (3447)

afbeelding van PingPong

24-09-2019, 20:08

Look at the previous page[S]. I forgot a s

Van zPasi

Champion (471)

afbeelding van zPasi

24-09-2019, 21:58

PingPong wrote:

Look at the previous page[S]. I forgot a s

So what? Did you run them and take time? Or count T-states?

Van PingPong

Prophet (3447)

afbeelding van PingPong

24-09-2019, 22:09

It is obviously the same

Van zPasi

Champion (471)

afbeelding van zPasi

25-09-2019, 08:23

PingPong wrote:

It is obviously the same

What is the same?

Van DarkSchneider

Paladin (869)

afbeelding van DarkSchneider

25-09-2019, 09:22

If you want to be productive, create some complex pure C code, without anything MSX specific (like VDP or BIOS), and compile them. Also no ports or hardware access.
But for SDCC remember to specify the parameters -mz80 --opt-code-speed

Van akumajo

Rookie (29)

afbeelding van akumajo

25-09-2019, 10:01

I use SDCC version 3.6.0 which is currently provided with Fusion-C.
Here are my practices for programming with SDCC :

I have 1 file msxbios.h with :

#ifndef  __MSXBIOS_H__
#define  __MSXBIOS_H__

// use RST assembler mnemonic to call
#define CHKRAM  0x00 // RST 0x00 > Check RAM and sets slot for command area.
#define VDP_DR  0x06 // VDP read port address
#define VDP_DW  0x07 // VDP write port address
#define SYNCHR  0x08 // RST	0x08 > Checks if then current character pointed by HL is one desired.
#define CHRGTR  0x10 // RST	0x10 > Gets the next character (or token) of the Basic-text
#define OUTDO   0x18 // RST	0x18 > Output to current outputchannel (printer, diskfile, etc.)
#define DCOMPR  0x20 // RST	0x20 > Compares HL with DE
#define GETYPR  0x28 // RST	0x28 > Returns Type of DAC
#define CALLF   0x30 // RST	0x30 > Executes an interslot call
#define KEYINT  0x38 // RST	0x38 > Executes the timer interrupt process routine

// use CALL assembler mnemonic
#define RDSLT   0x000C // Reads the value of an address in another slot
#define WRSLT   0x0014 // Writes a value to an address in another slot
#define CALSLT  0x001C // Executes inter-slot call
#define ENASLT  0x0024 // Switches indicated slot at indicated page on perpetual
#define MSXVER  0x002D // MSX version. 0=MSX1, 1=MSX2, 2=MSX2+, 3=turbo R.
...
#define MSXID2  0x002C
#define MSXID3  0x002D

#endif

and 1 file msxsysvar.h

#ifndef  __MSXSYSVAR_H__
#define  __MSXSYSVAR_H__

#define RDPRIM 0xf380 // 5  Routine that reads from a primary slot
#define WRPRIM 0xf385 // 7  Routine that writes to a primary slot
#define CLPRIM 0xf38c // 14 Routine that calls a routine in a primary slot
#define USRTAB 0xf39a // 2  Address to call with Basic USR0
#define LINL40 0xf3ae // 1  Width for SCREEN 0 (default 37)
#define LINL32 0xf3af // 1  Width for SCREEN 1 (default 29)
#define LINLEN 0xf3b0 // 1  Width for the current text mode
#define CRTCNT 0xf3b1 // 1  Number of lines on screen
...
#define PROCNM 0xfd89
#define DEVICE 0xfd99

#endif

I declare in my source.c the variables I am going to use.

__at (LINL40) unsigned int screenwidth;

__at (FORCLR) unsigned char foregroundcolor;
__at (BAKCLR) unsigned char backgroundcolor;
__at (BDRCLR) unsigned char bordercolor;

__at (CSRX) unsigned char cursorx;
__at (CSRY) unsigned char cursory;

__at (CLIKSW) unsigned char clikswitch;

So, no need peek or poke functions, but simply use of variables so declared :

void main (void)
{
	clikswitch=0;
	screenwidth=80;
	screen0();
}

ASM :

	.globl _screenwidth
;--------------------------------------------------------
; special function registers
;--------------------------------------------------------
;--------------------------------------------------------
; ram data
;--------------------------------------------------------
	.area _DATA
_screenwidth	=	0xf3ae
;testrom.c:141: screenwidth=80;
	ld	hl,#0x0050
	ld	(_screenwidth),hl

I use the "inline" functions for calls to the system functions whenever possible (beware of registers backups if interrupts are used) :

// CLRSPR
#define CLEAR_SPRITE __asm__ ("call 0x0069");

If someone knows how to use a define into another define thanks!... next declaration causes an compilation error : #define CLEAR_SPRITE __asm__ ("call CLRSPR");

Functions whithout parameters and return value :

void screen0() __naked
{
__asm
	xor  a
	jp	CHGMOD
__endasm;
}

gives :

;libmsx.h:32: void screen0() __naked
;	---------------------------------
; Function screen0
; ---------------------------------
_screen0::
;libmsx.h:37: __endasm;
	xor	a
	jp	0x005F

in main function :

;testrom.c:143: screen0();
	call	_screen0

>>> I use "__naked __z88dk_fastcall" for functions whith 1 parameter :

void putchar(unsigned char car) __naked __z88dk_fastcall
{
car;
__asm
	ld	a,l
	jp CHPUT
__endasm;
}
;libmsx.h:9: void putchar(unsigned char car) __naked __z88dk_fastcall
;	---------------------------------
; Function putchar
; ---------------------------------
_putchar::
;libmsx.h:15: __endasm;
	ld	a,l
	jp	0x00A2

in main function :

;testrom.c:147: putchar(' ');
	ld	l,#0x20
	call	_putchar

>>> Same (__naked __z88dk_fastcall) for functions that retrun a value :

unsigned char kbread(unsigned char kbline) __naked __z88dk_fastcall
{
kbline;
__asm
	in    a,(#0x0aa)
	and   #0b11110000
	add   a,l
	out   (#0x0aa),a
	in    a,(#0x0a9)
	cpl
	ld    l,a
	ret
__endasm;
}
;libmsx.h:63: unsigned char kbread(unsigned char kbline) __naked __z88dk_fastcall
;	---------------------------------
; Function kbread
; ---------------------------------
_kbread::
;libmsx.h:75: __endasm;
	in	a,(#0x0aa)
	and	#0b11110000
	add	a,l
	out	(#0x0aa),a
	in	a,(#0x0a9)
	cpl
	ld	l,a
	ret

in main function :

;testrom.c:146: hkb=kbread(8);
	ld	l,#0x08
	call	_kbread

example ; get mouse information :

// Structure for ReadMouseTo output data
typedef struct {
	unsigned char mousePort;
    signed char dx;
    signed char dy;
    unsigned char lbutton;
    unsigned char rbutton;
} MOUSE_DATA;

// Read Mouse Offset x and y, mouse button and return to MOUSE_DATA Structure
void MouseReadTo(MOUSE_DATA *md) __naked __z88dk_fastcall
{
md;
__asm
	ld a,(hl)
	inc hl

	ld de,#0x1310
	and #02
	jr z,GTMOUS
	ld de,#0x6C20

	GTMOUS:
	di
	ld  b,#30
	call    GTOFS2
	and #0x0F
	rlca
	rlca
	rlca
	rlca
	ld  c,a
	call    GTOFST
	and #0x0F
	or  c
	ld (hl),a
	inc hl
	call    GTOFST
	and #0x0F
	rlca
	rlca
	rlca
	rlca
	ld  c,a
	call    GTOFST
	ld  b,a
	and #0x0F
	or c
	ld (hl),a
	ld a,b
	and #0x10
	rlca
	rlca
	rlca
	rlca
	inc hl
	ld (hl),a
	ld a,b
	and #0x20
	rlca
	rlca
	rlca
	rlca
	rlca
	inc hl
	ld (hl),a
	ei
	ld  b,#40
	call WAITMS
	ret

	GTOFST: ld b,#10
	GTOFS2: ld a,#15
	out (#0xA0),a
	in  a,(#0xA1)
	and #0x80
	or  d
	out (#0xA1),a
	xor e
	ld d,a
	call WAITMS
	ld a,#14
	out (#0xA0),a
	in a,(#0xA2)
	ret
WAITMS:
	ld  a,b
	WTTR:
	djnz    WTTR
	.db  #0xED,#0x55
	rlca
	rlca
	ld  b,a
WTTR2:
	djnz    WTTR2
	ld  b,a
WTTR3:
	djnz    WTTR3
	ret
__endasm;
}

in main function :

// declaration:

static MOUSE_DATA md;

// call :

	md.mousePort = 1;
	MouseReadTo(&md);

ASM :

;libmsx.h:134: void MouseReadTo(MOUSE_DATA *md) __naked __z88dk_fastcall
;	---------------------------------
; Function MouseReadTo
; ---------------------------------
_MouseReadTo::
;libmsx.h:230: __endasm;
	ld	a,(hl)
	inc	hl
	ld	de,#0x1310
	and	#02
	jr	z,GTMOUS
	ld	de,#0x6C20
	 GTMOUS:
	di
	ld	b,#30
	call	GTOFS2
	and	#0x0F
	rlca
	rlca
	rlca
	rlca
	ld	c,a
	call	GTOFST
	and	#0x0F
	or	c
	ld	(hl),a
	inc	hl
	call	GTOFST
	and	#0x0F
	rlca
	rlca
	rlca
	rlca
	ld	c,a
	call	GTOFST
	ld	b,a
	and	#0x0F
	or	c
	ld	(hl),a
	ld	a,b
	and	#0x10
	rlca
	rlca
	rlca
	rlca
	inc	hl
	ld	(hl),a
	ld	a,b
	and	#0x20
	rlca
	rlca
	rlca
	rlca
	rlca
	inc	hl
	ld	(hl),a
	ei
	ld	b,#40
	call	WAITMS
	ret
	 GTOFST:
	ld b,#10
	 GTOFS2:
	ld a,#15
	out	(#0xA0),a
	in	a,(#0xA1)
	and	#0x80
	or	d
	out	(#0xA1),a
	xor	e
	ld	d,a
	call	WAITMS
	ld	a,#14
	out	(#0xA0),a
	in	a,(#0xA2)
	ret
	WAITMS:
	ld	a,b
	 WTTR:
	djnz	WTTR
	.db	#0xED,#0x55
	rlca
	rlca
	ld	b,a
	WTTR2:
	djnz	WTTR2
	ld	b,a
	WTTR3:
	djnz	WTTR3
	ret

in main function :

;testrom.c:190: MouseReadTo(&md);
	ld	hl,#_md
	call	_MouseReadTo

Other example ; Pset function :

// Structure for Pset
typedef struct {
	unsigned int X;
	unsigned int Y;
	unsigned char color;
	unsigned char vram_type;
	unsigned char lop;
} PSET;
/* PSET
* graphic mode 5 and 8 display of a point at coordinates x, y with color and logical operation
*/
void Pset(PSET *pxy) __naked __z88dk_fastcall
{
pxy;
__asm
	di
	call	WAIT_VDP
	ld		bc,(#AWRVDP)
	inc		c
	ld		a,#36
	out 	(c),a
	ld		a,#128+17
	out		(c),a
	inc		c
	inc		c
	outi
	outi
	outi
	outi
	dec     c
	dec     c
	ld		a,#44
	out		(c),a
	ld		a,#128+17
	out		(c),a
	inc		c
	inc		c
	outi
	outi
	ld		e,#0b01010000
	ld		a,(hl)
	and		#0x0f
	or		e
	out		(c),a
	ei
	ret

WAIT_VDP:
	ld		a,#2
	call	GET_STATUS
	and		#1
	jp		NZ,WAIT_VDP
	xor		a
	call	GET_STATUS
	ret

GET_STATUS:
	ld		bc,(#AWRVDP)
	inc		c
	out		(c),a
	ld		a,#128+15
	out		(c),a
	ld		bc,(#ARDVDP)
	inc		c
	in		a,(c)
	ret
__endasm;
}

// declaration :

static PSET pset;

in main function :

// call function :

	pset.X=100;
	pset.Y=100;
	pset.color=15;
	pset.vram_type=0; // 0= main VRAM
	pset.lop=0; // 0=IMP
	Pset(&pset);

ASM :

;testrom.c:181: pset.X=100;
	ld	hl,#0x0064
	ld	(_pset), hl
;testrom.c:182: pset.Y=100;
	ld	l, #0x64
	ld	((_pset + 0x0002)), hl
;testrom.c:183: pset.color=15;
	ld	hl,#(_pset + 0x0004)
	ld	(hl),#0x0f
;testrom.c:184: pset.vram_type=0; // 0= main VRAM
	ld	hl,#(_pset + 0x0005)
	ld	(hl),#0x00
;testrom.c:185: pset.lop=0; // 0=IMP
	ld	hl,#(_pset + 0x0006)
	ld	(hl),#0x00
;testrom.c:187: Pset(&pset);
	ld	hl,#_pset
	call	_Pset

No "push", no "pop" in generated assembly code

Van ToriHino

Champion (366)

afbeelding van ToriHino

25-09-2019, 10:19

Now that's finally an informative post again in this thread. I see already some useful hints on how to to make better use of SDCC, thanks!

Van PingPong

Prophet (3447)

afbeelding van PingPong

25-09-2019, 12:49

I do not think to leave out msx specifics. We are focusing on msx, so it is a good thing to see how those compilers fit on msx development.

Plus i do not think that port based access is an issue on both compilers, the banked out port extensions work fine on both compilers and are not the bottleneck, imho.
What would be to see is IMHO:
- msx should use asm for better performance. however, some complex high level tasks are a bit hard in asm. So the need of 'C'. It is unavoidable that switching to 'C' loose a certain level of performances. The question is: Does the performance sacrifice to heavy?
- as said, some parts are to be in asm. How much is weighting the C/Asm interface in terms of performances?
- If i use 'C' it is because i do not want for specific parts to deal with the complexity of asm. So i need a relatively smart compiler that does not need to much (ideally zero) help in terms of tricks to achieve speed.
No tricks like static variables or manual subexpression elimination. Otherwise i will loose the benefit in terms of reduced complexity (read:work to be done by hand) that i expect from a 'C' compiler.

the benchmarks should highlight those aspect. Plus, timing the execution speed or simply counting the t-states are equivalent. No need to run a routine to know how it is faster if you already know t-state count.

Van akumajo

Rookie (29)

afbeelding van akumajo

25-09-2019, 14:58

ToriHino wrote:

Now that's finally an informative post again in this thread. I see already some useful hints on how to to make better use of SDCC, thanks!

Thank you ToriHino, I do not know if it is necessary to detail more but call convention of z88dk fastcall supports only one parameter of max 32 bits, passed in registers.
8-bit values are passed in L (char/unsigned char), 16-bit values in HL (int), 32-bit values in DEHL (long)
Return values are passed in registers.
8-bit return values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.

The __naked function attribute prevents the compiler from generating prologue and epilogue code for that function.
This gives you full control of the function code, but you are fully responsible for saving any registers that may need to be preserved, not forgetting of returning via ret (or jp which ends via ret).

SDCC uses IX register as a frame pointer. Frame pointer is a pointer to space for function's local variables, allocated on the stack. After storing used registers on stack, pushing arguments on stack and setting frame pointer, SDCC will call a function and then restore stack pointer and saved registers.

This behavior can be adjusted with compile options listed below or function attributes like __z88dk_callee.

--callee-saves-bc compile option tells the compiler not to save BC on the stack before calling a function. If you do not use BC in your asm function, it can save time and size required to store and restore BC from the stack for each function call.

--fomit-frame-pointer compile option will cause that frame pointer will be omitted when the function uses no local variables. As per SDCC documentation, for z80 code generator if this option is used, frame pointer will be omitted for all functions.

--fno-omit-frame-pointer will never omit the frame pointer, ie. frame pointer will be always set up for each function even without local variables. This will generate some overhead in prologue and epilogue of each function, but guarantee that IX will always point to frame pointer and local vars can be accessed via IX+n addressing.

This also explains why you saw both stack and IX methods of referencing function parameters in various examples - some of these examples used --fno-omit-frame-pointer and accessed parameters through IX register, while others just popped them from the stack.

Pagina 17/17
10 | 11 | 12 | 13 | 14 | 15 | 16 |