Functions

Terminology

You may have noticed that I’ve been using the term function for something refered to in the textbook as a subroutine. In general, these terms can be used interchangibly. While there are arguably some different implications to each term, they’re not widely agreed upon in the context of software.

A Few Arguments

If we have six arguments or fewer, we can construct our function quite simply. Let’s consider a function int max(int a, int b) that simply returns the larger of its two arguments. Take a look at max.s below:

! max - finds the max of two integers
fmt:	.asciz	"Max: %d\n"
	.align 	4
	.global main
main:
	save	%sp, -96, %sp

	! call max
	mov	20, %o0
	mov	28, %o1
	call	max
	nop

	! print the result
	mov	%o0, %o1
	set	fmt, %o0
	call	printf
	nop

	! exit
	mov	0, %o0	! exit with 0
	mov	1, %g1	! exit trap
	ta	0	! trap to system

! int max(int,int)
! returns the larger of the two given values
max:	save	%sp, -64, %sp
	cmp	%i0, %i1
	ble,a	maxret
	mov	%i1, %i0
maxret:
	ret
	restore

You can see before the call to max that we’re putting our arguments into %o0 and %o1. This is exactly the same as how we called other functions we’ve used previously, like printf.

Within max, we immediately save. This saves the local and input registers, turns the output registers into input registers, and adjusts the stack and frame pointers. You must supply at least 64 bytes to store those local and input registers.

The parameters we set in the %o0 and %o1 registers are now available in the %i0 and %i1 registers, respectively. We can do a comparison between, them and if neccessary move the value from %i1 into %i0 to ensure that the largest value is stored in %i0.

Finally, we reach the ret instruction, and we branch back to the instruction immediately following the delay slot of the call. But, our ret instruction has its own delay slot, which we’ve filled with restore to undo the save from the start of the function.

Having returned back to main, the larger of the two values is guaranteed to be stored in %o0.

Many Arguments

If we have more than six arguments, we run out of input registers and have to do things differently. Naievely, it seems like you have enough registers, with %o0 through %o7 at your disposal, but the registers %o6 and %o7 are reserved, so they cannot be used. Instead, we can place our remaining two arguments on the stack. By convention, there’s 24 bytes per frame left on the stack for function arguments. That’s like having space for six more registers!

Given that we’re going to be relying on these standard conventions in our program, we should probably just use the standard 96 byte save in our functions. In general, it’s simpler than trying to optimize our memory usage within each function.

Let’s consider a function int max8(int a, int b, int c, int d, int e, int f, int g, int h) that simply returns the largest argument it is given. We can store the last two values on the stack using the right offsets, as shown in max8.s below:

! max8 - finds the max of eight integers
fmt:	.asciz	"Max: %d\n"
	.align 	4
	.global main
main:
	save	%sp, -96, %sp

	! call max
	mov	20, %o0
	mov	28, %o1
	mov	83, %o2
	mov	12, %o3
	mov	11, %o4
	mov	43, %o5
	mov	8, %l0
	st	%l0, [%sp + 4 + 64]
	mov	13, %l0
	st	%l0, [%sp + 8 + 64]
	call	max8
	nop

	! print the result
	mov	%o0, %o1
	set	fmt, %o0
	call	printf
	nop

	! exit
	mov	0, %o0	! exit with 0
	mov	1, %g1	! exit trap
	ta	0	! trap to system

! int max8(int,int,int,int,int,int,int,int)
! returns the largest of the eight given values
max8:	save	%sp, -96, %sp
	ld	[%fp + 4 + 64], %o0
	ld	[%fp + 8 + 64], %o1

! values are now in %i0, %i1, %i2, %i3, %i4, %i5, %o0, %o1

	cmp	%i0, %i1
	ble,a	1f
	mov	%i1, %i0
1:
	cmp	%i0, %i2
	ble,a	1f
	mov	%i2, %i0
1:
	cmp	%i0, %i3
	ble,a	1f
	mov	%i3, %i0
1:
	cmp	%i0, %i4
	ble,a	1f
	mov	%i4, %i0
1:
	cmp	%i0, %i5
	ble,a	1f
	mov	%i5, %i0
1:
	cmp	%i0, %00
	ble,a	1f
	mov	%o0, %i0
1:
	cmp	%i0, %o1
	ble,a	1f
	mov	%o1, %i0
1:	ret
	restore

As you can see, it’s only a few extra lines of code to put additional parameters on the stack. By convention, the space from %sp + 68 to %sp + 92 is dedicated to input arguments. Note that we’re using positive offsets from the stack pointer to store our arguments. This is different from how we addressed local stack variables, which used negative offsets from the frame pointer.

Calling save in max8 changes the stack. The stack pointer grows more negative, and the frame pointer takes on the stack pointer’s old value. Consequentially, when accessing our extra arguments from within max8, we now use positive offsets from the frame pointer.

Short Labels

As you use more functions and your programs grow in size, it may be annoying to specify unique, meaningful labels for every branch. If a named label would be more of a hiderence than a help, you can use a single-digit numeric label instead. These labels can be duplicated, and are branched to by specifying the digit followed by f or b. The suffix f indicates that you wish to branch to the next matching label, while the suffix b indicates that you wish to branch to the previous matching label.

I suggest only using these short labels when you’ll be branching from some place very nearby. The code above is easily understandable in part because there’s a clear pattern, and the label is only ever a couple lines away from the branch statement. The further the branch, the more important a descriptive label is.

Inline Functions

Macros

Inline functions can be created using macros. They are very simple, basically just using the macro compiler to copy and paste code from the macro definition to each location where the macro is used.

It’s slightly fancier than that, though, because within your macro code, you get the special variables $1, $2, $3, etc. When the macro compiler does its copy and pasting, it replaces $1, $2, $3, and so forth, with the 1st, 2nd, 3rd and subsequent arguments given to the macro.

The # character is the macro comment character, which can use when documenting the behaviour and parameters of our function. The # prefixed lines are not discarded; they do show up in the .s file, but gcc ignores them. Alternatively, we could define a comment macro that replaces all inputs with nothing.

Check out the program below, shift.m.

# rashift(hir, lor, tmp)
#
# rashift implements an arithmatic right shift by 1 bit,
# acting on a pair of 32bit registers as if they were a
# single 64bit register.
#
# hir - the high register
# lor - the low register
# tmp - a temporary register for use for calculation
define(rashift, `! `rashift' $1, $2, $3
	sll	$1, 31, $3
	sra	$1,  1, $1
	srl	$2,  1, $2
	or	$3, $2, $2')

fmt:	.asciz	"value = 0x%x%x\n"
	.align	4
	.global	main
main:	save	%sp, -96, %sp

	! initialize with test data
	set	0x84218421, %l0
	set	0x82411248, %l1

	! print value before `rashift'
	set	fmt, %o0
	mov	%l0, %o1
	call	printf
	mov	%l1, %o2

	! do `rashift'
	rashift(%l0, %l1, %l2)

	! print value after `rashift'
	set	fmt, %o0
	mov	%l0, %o1
	call	printf
	mov	%l1, %o2

	! exit program
	mov	1, %o0  !return zero on exit
	mov	1, %g1  !exit request
	ta	0       !trap to system

Escaping Macros

Notice that I quoted rashift every time I mentioned it in a comment? This is very important to prevent m4 from trying to expand the macro. If you forget, m4 will insert broken code into your program. Or worse, if you forget within the macro definition of rashift itself, you’ll cause an endless loop resulting in this error message:

m4:shift.m:26 pushed back more than 4096 chars

So, be sure to remember to quote any comments referring to the function, or you’re going to have problems.

Comments

I started the macro definition with ! `rashift' $1 $2 $3. This is an assembly comment, so that we can easily find our usage of rashift within the .s file and see how it was called.

Inspecting the .s generated by m4 is sometimes quite helpful in debugging problems that may not be obvious in the .m file, so its worth leaving hints to help track problematic code in the .s back to its origin in the .m.

In fact, let’s take a look at the .s file right now.

# rashift(hir, lor, tmp)
#
# rashift implements an arithmatic right shift by 1 bit,
# acting on a pair of 32bit registers as if they were a
# single 64bit register.
#
# hir - the high register
# lor - the low register
# tmp - a temporary register for use for calculation


fmt:	.asciz	"value = 0x%x%x\n"
	.align	4
	.global	main
main:	save	%sp, -96, %sp

	! initialize with test data
	set	0x84218421, %l0
	set	0x82411248, %l1

	! print value before rashift
	set	fmt, %o0
	mov	%l0, %o1
	call	printf
	mov	%l1, %o2

	! do rashift
	! rashift %l0, %l1, %l2
	sll	%l0, 31, %l2
	sra	%l0,  1, %l0
	srl	%l1,  1, %l1
	or	%l2, %l1, %l1

	! print value after rashift
	set	fmt, %o0
	mov	%l0, %o1
	call	printf
	mov	%l1, %o2

	! exit program
	mov	1, %o0  !return zero on exit
	mov	1, %g1  !exit request
	ta	0       !trap to system

As you can see, not much changed. Our define at the top is gone. Our # comments have stayed, and our rashift(%l0, %l1, %l2) has been replaced with code from our define.

It’s worth noting, though, that the additions and removals mean that the line numbers don’t always match between the .s and .m. That’s something to remember when gcc tells you that there’s an error in the .s file on some particular line.

Big Literals

What is a Literal?

First off, I want to be very clear that the information in this post only applies to literals. So, it’s important we have our terms straight to begin with. A literal is a value that appears in your code directly. For example, in add %l0, 23, %l1, the value 23 is a literal.

Using Big Literals

Let’s go back to Hello World. Back then, life was simple. The meaning of life was 42, and everything just worked. But in today’s modern world, nothing can remain so small and simple for long. Now the meaning of life is 425364. Easy change, right?

fmt:	.asciz	"the meaning of life is %d\n"
	.align	4
	.global	main
main:	save	%sp, -96, %sp

	set	fmt, %o0
	mov	425364, %o1
	call	printf
	nop
	mov	1, %g1
	ta	0

There’s only one problem: it won’t compile.

$ gcc answer.s -o answer && ./answer
answer.s: Assembler messages:
answer.s:6: Error: relocation overflow

But why? I mean, we know that SPARC registers are 32-bit. That gives us 232 (4294967296) possible values! Our value of 425364 is way smaller than that. So what gives?

Instruction Format

The key takeaway here is that while we may have 32-bit registers, we also have 32-bit instructions. Obviously, we can’t store the instruction information and a full 32-bit literal in a 32-bit instruction.

mov 42, %o1 is a synthetic instruction which translates into or %g0, 42, %o1 The instruction reference for or at the back of the textbook shows that the opcode, input register and output register identifier alone take up 19 bits. That leaves 13 bits for the literal.

The literal is considered signed, though, which means that half the values are negative. This gives us 12 bits dedicated to positive numbers and zero. Thus, the maximum value we can directly set using mov is 4095 (0xFFF).

Note that if you put a positive value between 4095 (0xFFF) and 8191 (0x1FFF), that value will be interpreted as the 13-bit negative number with that bit pattern. It will then be loaded into the 32-bit register as that negative value. That conversion from a 13-bit negative value to a 32-bit negative value is basically going to prefix a bunch of Fs on your value. So, 0x1234 will become 0xFFFFF234.

Just be sure you never try to mov a literal greater than 0xFFF.

The Solution

When loading big literals into a register, you need to use the synthetic instruction set. For literal values too big to set with a single instruction, this instruction actually becomes two instructions (so don’t use it in a delay slot!). First, a sethi instruction to set the most significant bits, then an or to set the least significant bits.

So, our fix is simple:

fmt:	.asciz	"the meaning of life is %d\n"
	.align	4
	.global	main
main:	save	%sp, -96, %sp

	set	fmt, %o0
	set	425364, %o1	! use set for large values
	call	printf
	nop
	mov	1, %g1
	ta	0

And now:

$ gcc answer.s -o answer && ./answer
the meaning of life is 425364

Registers Are OK

Unlike literals, values in registers do not have these limitations. The problem here was caused by running out of room to represent the literal within our instruction. When we refer to registers within our instructions, however, they take up a constant amount of space. That is to say, no matter how big the value contained in %l0, we can still just refer to it as %l0.

So, remember to use set to load your big literals into registers. Once they’re in there, you can manipulate the registers with mov, add, or and all the other operations you’re used to. And you won’t have to worry about this.

Intro to GDB

Reminder!

The lab computers have an old, buggy version of GDB installed as the default. Be sure to use /usr/local/bin/gdb, otherwise you will encounter problems. For example, with the buggy version of gdb, the ni command may not return after calling .mul.

Basic Commands

This material is covered quite well in Section 2.7 of the textbook. But, for reference, here are a few examples of useful gdb commands for debugging assembly programs:

/usr/local/bin/gdb a.out # open a program named a.out in gdb
(gdb) break main       # stop program upon reaching main label
(gdb) run              # execute until you hit a break point
(gdb) x/5i main        # print the first 5 instructions in main
(gdb) display/i $pc    # print the current instruction at each stop
(gdb) ni               # execute the next instruction
(gdb) print $l0        # print the contents of register %l0
(gdb) break *main+8    # stop two instructions after main
(gdb) delete break 1   # remove the first breakpoint created
(gdb) delete display 1 # remove the first display created
(gdb) continue         # resume execution until the next break point
(gdb) quit             # exit the program

Common Errors

  • Using gdb rather than /usr/local/bin/gdb
  • Using % rather than $ in display and print
  • Debugging the wrong program (e.g. because you forgot to recompile)
Basic SPARC

Compiling and Running

Like C programs, assembly code must also be compiled to create an executable. Assembly source code files use the extension .s and can be passed to GCC in exactly the same way as C. That is, a source file called hello.s could be compiled into an executable named hello with the command gcc hello.s -o hello.

Give it a try with the following code in hello.s. Remember that to run your program, it requires ./hello because you need to specify which directory your program resides in!

fmt:	.asciz	"the meaning of life is %d\n"
	.align	4
	.global	main
main:	save	%sp, -96, %sp

	set	fmt, %o0
	mov	42, %o1
	call	printf
	nop
	mov	1, %g1  !exit request
	ta	0       !trap to system

Macros

It’s helpful to be able to use macros in our programs. Rather than editing a .s file directly, we’ll put our code in a different file and then generate the raw assembly. The file name doesn’t matter for this, but let’s use the extension .m as our convention. The macro file can be compiled into an assembly file like so: m4 hello.m > hello.s

Putting those two steps together, we get this process to transform a .m file into an executable file:

m4 hello.m > hello.s
gcc hello.s -o hello

Doing both these steps each time you want to compile your program gets a little tedius. You can combine them into a single command using &&. It ends up looking like m4 hello.m > hello.s && gcc hello.s -o hello. The second part of the command will only execute if the first part succeeds.

Here’s a macro program named expr.m that you could try building. It may be a handy example to practice debugging.

/* This program computes the expression:
  y = (x - 1) * (x - 7) / (x - 11) for x = 9
The polynomial coeficients are:
*/
  define(a2, 1)
  define(a1, 7)
  define(a0, 11)

/* Variables x and y are stored in %l0 and %l1 */
  define(x_r, l0)
  define(y_r, l1)

  .global	main
  .align	4
main:
  save	%sp, -96, %sp
  mov	9, %x_r		!initialize x
  sub	%x_r, a2, %o0	!(x - a2) into %o0
  sub	%x_r, a1, %o1	!(x - a1) into %o1
  call	.mul
  nop			!result in %o0
  sub	%x_r, a0, %o1	!(x - a0) into %o1, the divisor
  call	.div
  nop			!result in %o0
  mov	%o0, %y_r	!store it in y

  mov	1, %g1		!exit request
  ta	0		!trap to system