I've always heard its a good idea to test multithreaded apps on dual-cores and now I see why.
Reason of course being that multiple threads on a single core are pretty useless/detrimental except for I/O locking.

/* last two posts */
That... sounds very inefficient. I would recommend the pattern I talked about earlier (don't recall which forum, this was quite some time ago) using a non-locking pattern. I used a combination of message passing and volitale memory. I set it up so a variable is only ever written to by one thread, but may be read by many (except in a few cases, also described here). For example, an actor class has a number of parts, a physics body, a graphical body, input (by player or AI or what-not). It worked in a simplified manner of when an input was received then all listener's of a certain input event (I have a nice binding system so there can be any number of virtual 'keys' (up, down, jump, rotate_up, etc... etc...) that can be added at any point, and were bound to any number of other virtual keys. Real keys always called a certain named virtual key (so the up key called the virtual key key_up, and w did key_w and so forth. The variable was a floating point number bound between 0 and 1. Keyboard keys, mice buttons, joystick buttons, mouse axis, joystick axis, etc... etc... always mapped to a certain virtual key, axis were done so each direction went to a different one such as joy0_0_pos, joy0_0_neg, joy0_1_pos, etc..., always bound between 0 and 1, so pushing up or down on something could do different things, and you could even put in a filter to bind it between something, so it only called listeners if it went between like 0.2 and 0.38 or whatever when bound to a different key (when a virtual key was bound to another virtual key, a filter could be passed in). When an input was called though, and a listener received the event and acted on it, say the key was 'key_w', which called 'forward_0', which called the listener bound inside the player controller 0's callback 'forward', which then called the attached actor's 'forward' callback, which set a volitale variable atomically to its value plus the craft's forward rate (say 80.0). By atomically I mean that the variable fit into a register, and I could do an atomic CAS to set it, if other things were trying to set at the same time, then they would all succeed in the proper way. When the physics thread came by and reads that value atomically (atomically because the full value will not fit into a register, and if it was not then different axis could be wrong if something was writing to it as well), applies the proper forces, etc.. atomically reads and sets the value at a minus of what it read before (in case something wrote to it while the physics forces were being calculated on it). When the physics finishes calculating a world step and it calls a callback on the bodies to tell them of their updated position and orientation, that callback would send a message to the graphics thread that when it goes through its messages (once per frame), it would see that the body says its position is updated, so the graphics thread would then atomically read the new position and orientation, updates the graphical body, and continues on. The AI of the AIControllers that control actors would do what they can during an update, making sure they are thread-safe of course (meaning using atomic primitives when possible, not locks), and just put them in an atomic circular queue so when any thread has any spare time then it can pull the next task off and run it. With enough of the system running as those 'tasks' in small enough timeslices (not too small though), then it could scale exceedingly well with many cores, where-as the above posted method does not.