Avoid ‘float’ Parameters on Xbox 360 (XNA)

Previously, I said I would describe inefficiencies of the NetCF JIT compiler on Xbox if I could provide a reasonable workaround. This tip is pretty reasonable: avoid using float as a parameter type in function signatures. Use double instead. Here’s why…

The NetCF uses a calling convention that passes all arguments on the stack. Furthermore, NetCF also passes all float arguments as doubles. That is, callers push each 4-byte float argument onto the stack as an 8-byte double. The called function then converts each double argument value back into a float, and writes it back onto the stack in the exact same spot. It does this by loading the double value into a register, rounding it to single-precision, and then writing it back to the stack as a single.

Why? I have no idea; but that’s not important. What’s important is that this superfluous data conversion is expensive, and you’re better off avoiding it.

The expense is incurred upon entry to the function, when each float parameter is loaded into a register, rounded, and stored back into the stack.

If you rewrite the method to receive a double parameter instead of a float, then the code generated at the caller will be the same (at the call site, it costs the same to put either a float or double onto the stack. ), but the method itself will not include expensive load-round-store initialization at the beginning of the function body.

Note: There are code examples in the text below, to illustrate.

An important caveat is that float values in complex types (eg, structs) are not converted this way when the containing type is passed by value. So, for example, if you pass a Vector2 instance by value, it will be passed in 8 bytes (2 x 4-bytes for each float member).

You have to pass the whole structure, though, not just a member.

This implies a big difference between the following functions:

float AddXandY(Vector2 v);

float AddXandY(float x, float y);

The first function will receive an 8-byte Vector2 value on the stack. The second function will receive 2 x 8-byte double values on the stack, which it will have to round to single-precision before executing any of its function body.

Take a look at the native code.

; float AddXandY(Vector2 v)
; (0xBE29F26C) size=92 bytes
mflr    r12
stw    r12, 8(r15)
addi    r1, r1, -44
stw    r15, 0(r1)
addi    r15, r1, 0
addis    r3, r0, 48681
ori    r3, r3, 62060
addi    r5, r0, 0
stw    r5, 20(r15)
stw    r3, 12(r15)
stw    r15, 0(r30)
    addi    r4, r15, 44
    lfs    fr1, 0(r4)
addi    r4, r15, 44
lfs    fr2, 4(r4)
fadd    fr1, fr1, fr2

frsp    fr1, fr1
lwz    r15, 0(r15)
addi    r1, r1, 52
stw    r15, 0(r30)
lwz    r12, 8(r15)
mtlr    r12
blr

; float AddXandY(float x, float y)
; (0xBE29F3CC) size=108 bytes
mflr    r12
stw    r12, 8(r15)
addi    r1, r1, -44
stw    r15, 0(r1)
addi    r15, r1, 0
addis    r3, r0, 48681
ori    r3, r3, 62412
addi    r5, r0, 0
stw    r5, 20(r15)
stw    r3, 12(r15)
stw    r15, 0(r30)
    lfd    fr0, 44(r15)
frsp    fr0, fr0
stfs    fr0, 44(r15)
lfd    fr0, 52(r15)
frsp    fr0, fr0
stfs    fr0, 52(r15)

   lfs    fr1, 52(r15)
    lfs    fr2, 44(r15)
    fadd    fr1, fr1, fr2
frsp    fr1, fr1
lwz    r15, 0(r15)
addi    r1, r1, 60
stw    r15, 0(r30)
lwz    r12, 8(r15)
mtlr    r12
blr

In the two disassembled functions above, the orange highlight shows the code used to convert the double arguments to floats. That conversion is only present when a function signature contains float parameters.

The green highlights above represent loading the single-precision arguments and adding them. The difference in how the values are loaded is due to the difference in parameter type – fields are loaded via an offset from the address of the containing type.

Looking at the disassembled code, there are other obvious inefficiencies. Unfortunately, most of the inefficiencies are unavoidable, or else you end up trading one for another where there isn’t a clear winner in all situations. That’s why I said I would only write about inefficiencies that have practical workarounds. Avoiding float params is one such workaround.

Here is the function declared with double parameters:

; float AddXandY(double x, double y)
; (0xBE68F4DC) size=88 bytes
mflr    r12
stw    r12, 8(r15)
addi    r1, r1, -44
stw    r15, 0(r1)
addi    r15, r1, 0
addis    r3, r0, 48744
ori    r3, r3, 62684
addi    r5, r0, 0
stw    r5, 20(r15)
stw    r3, 12(r15)
stw    r15, 0(r30)
    lfd    fr1, 52(r15)
lfd    fr2, 44(r15)
fadd    fr1, fr1, fr2

frsp    fr1, fr1
    frsp    fr1, fr1
lwz    r15, 0(r15)
addi    r1, r1, 60
stw    r15, 0(r30)
lwz    r12, 8(r15)
mtlr    r12
blr

The main difference between this function and the one using floats is that the arguments are not converted to single-precision and written back to the stack before adding them (green highlight).

A word of caution: before converting all your float parameters to double, take a look at the orange highlight. I’m pretty certain this is a bug in the JIT compiler, as what I’ve highlighted is a redundant rounding operation (frsp is “floating-point round to single-precision”). This happens whenever you cast from double to float. I’m pointing this out because if you need to store the result of floating-point arithmetic in a float variable (like a field in a struct), then using doubles could do more harm than good (depends on how much arithmetic you need to do before storing the result).

It’s worth noting that casting from float to double in an expression incurs no cost. This is because all floating-point values are automatically converted to double-precision when loaded into a register (this is done by the CPU). For this reason, when you mix double- and single-precision values in an expression, it is preferable to perform the arithmetic at double-precision.

For example, prefer this:

float result = (float)(((double)singleValue + doubleValue1) * doubleValue2);

over this:

float result = (singleValue + (float)doubleValue1) * (float)doublevalue2;

In the first case, casting singleValue to double doesn’t use an instruction, but the explicit cast for the result emits two rounding instructions. In the second case, casting doubleValue1 and doubleValue2 to singles causes explicit rounding, as well as an additional rounding before storing the result (required before storing any float value).

Avoiding float parameters is a reasonable perf tip, but if you do it, you must be cautious about mixing float and double values.

Another tip I can provide is to avoid creating very small functions. That should be obvious from the disassembly, but in case it isn’t, there is a lot of painfully expensive overhead in the examples I provided. (Wouldn’t it be great if you could force those little functions to be inlined?)

Happy coding!

PS: My highlights weren’t preserved when I published the article, so I’ve atttempted to fix it as best as I could. I apologize if the assembly code is hard to read.

About these ads

About badcorporatelogo

My name is Stephen Styrchak, and I work as a software developer in Seattle, Washington, USA.
This entry was posted in Programming, XNA Game Studio. Bookmark the permalink.

9 Responses to Avoid ‘float’ Parameters on Xbox 360 (XNA)

  1. ericcosky says:

    Thank you for taking the time to analyze this. I had heard rumors about the behavior of floats/doubles on the XBox XNA but had nothing solid to go with and never took the time to look into as carefully as you did. Useful stuff!

  2. someguy says:

    Brilliant! Please consider posting your tool, it sounds amazingly useful!

  3. Pingback: Xbox/XNA performance tuning « IceFall Games

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s