3500 points per day on a Quad Core CPU? Icrontic shows you how!

Ultra Nexus (Ultra Nexus) Icrontic forums folding expert Jordan Van Der Meijden shows us how to absolutely maximize folding performance and get record numbers from a single dedicated computer.

September 6, 2007 12:00 AM ET in Articles, ,


This guide is aimed at those lucky (or crazy) enough to have a Quad Core processor dedicated for folding, that will be running 24/7. If this is not the case- (perhaps you have a Quad but you use it in your main computer), this guide will not help you, not because it is more CPU intensive, but memory intensive. Both Virtual Machines will allocate between 500 to 800 megabytes of RAM each!

These are the minimum requirements for 2 SMP (symmetric multiprocessing) clients on a quad core processor:

  1. An Intel Quad core processor (Either an Intel Core2 Quad or Core2 Quad Extreme), AMD X4 (when available) or a dual socket, dual core rig.
  2. 2 gigabytes of RAM or more
  3. Very good cooling if overclocking.

The Folding@Home SMP client, although much more efficient than running multiple uniprocessor clients, still isn’t the most efficient way to fully utilize the processing power of a Quad processor.

The solution? Run two SMP clients!

The SMP client was designed with four processors in mind, so that each time the client is launched, 4 instances of FAHCore are created and run on each logical CPU core. The problem is (at least with Intel´s CPU) that since a Quad
only consists of two glued dual core CPUs, the communication between each dualcore unit is done through the front side bus and northbridge chipset—things that cause overhead.

So, the idea is to isolate each SMP client to a physical core (consisting of 2 logical cores) so that they don’t have to communicate with the other physical core. In conclusion, there will be 2 SMP clients and each will generate
4 instances, which will run on each physical core.

The problem lies in that we can not configure the client to tell it what cores we want them to use, since it will always create 4 instances, and on a quad processor, 4 cores will be detected and the instances loaded on all of them.

With VMware, however, we can create two individual Virtual Machines (guests) and assigned each of two logical cores to each Virtual Machine (VM). If it sounds complicated, it really isn’t.

Once we start up one VM, we then set the affinity (in the Host) for cores 1 and 2, and the second VM to cores 3 and 4. This way, each SMP client will detect only two cores (set previously in the Virtual Machine configuration) and also will utilize the same physical core (just as if we had a Core2 Duo instead). For a dual socket, dual core machine, it’s exactly the same thing. For a dual socket, dual Quad core machine, there is the possibility of creating four VMs with two cores assigned to each. Of course, it’s doubtful that someone with that kind of massive configuration will have it dedicated to Folding. If that someone is out there, however – please consider folding for Team 93!

Now, on to my VMWare configuration:

I use Windows 2003 Server 32bit for the Host. I haven’t tried using the 64
bit version since the Virtual Machines can run in 64bit mode even over a 32bit host. I did not install Linux as a Host mostly because of hardware compatibility, but the idea would be the same. I am using VMWare Workstation 6.0.

For the Guest, I use SUSE 10.1 x64 but any other 64bit Distro would do. Here it’s mandatory that the Guest be 64bit Linux as the SMP Linux folding client requires it. I also chose Linux because in most cases, the WUs fold faster (leading to more points per day!)

So, we have a Windows 32bit Host and we will create 2 Linux 64bit Guests. Each will run an instance of the SMP client and will be running on 2 Cores.

Once VMWare is installed, you need to create a new Virtual Machine. You will be asked how much disk space, memory, and how many CPU cores (two, in this case) will be allocated.

My setup is as follows:

  1. Disk space = 4Gb (both root / and swap partitions will be installed in this 4Gb image)
  2. CPU cores = 2
  3. Memory space = 768mb
  4. Network type = Use NAT (this option will make each VM have its own IP but will use the Host IP to communicate with the internet and the rest of the LAN computers).

Once created, you can boot from your optical storage and install Linux normally. Once it’s installed and configured, do not install the SMP client yet, stop the Virtual Machine and CLONE it from within the VMware menu. You will have to reconfigure the Computer Name and the NIC configuration for this cloned Linux OS as the new VM will generate a new MAC address.

Now that both Virtual Machines are ready, you can install the SMP client on both (which will generate new User IDs) and once the first one starts folding, in Windows, set the affinity of the VMWare process (should appear with 50% CPU usage) to cores 1 and 2. Start the second VM, and launch the SMP client. Again, set the affinity for the new VMWare process to cores 3 and 4, and let it fold!!!

With this amazing setup, you will be pushing over 3000 points per day on a single machine. We’ll be tracking the progress of my dedicated Quad Core folder as well as other members’ in our forums.

Any questions are welcomed!

72 Comments:

  1. Great guide, Ultra Nexus!

  2. What kind of point increase are you getting with VMware over a single client ?

    Right now the VMs are doing between 12:30 to 13:20 minutes per frame on the p2653, thats like 4050 to 3800PPD for both combined.

    Seems the WU does not access RAM much, and work most of the time in the CPU´s L2, that would explain why my TPF is similar to two individual E6600.

  3. Right now the VMs are doing between 12:30 to 13:20 minutes per frame on the p2653, thats like 4050 to 3800PPD for both combined.

    Seems the WU does not access RAM much, and work most of the time in the CPU´s L2, that would explain why my TPF is similar to two individual E6600.

    Hmmmm

    I may have to give it a go.....I know what I'm doing this weekend !

    Thanks Ultra !

    Scott

  4. HOLY COW... Nice write up!!!!

  5. I edited the article and published it on our main page. Take a gander, it's a great read!

    PS: Congrats UltraNexus on your first published Icrontic article!

  6. Good stuff, Mr. U!

  7. Good job, Nex. I think you may even have enough here to hack out some workable variations. Thank you, Senor!

  8. I am really glad you all liked it.
    I would never thought it would turn to be an Icrontic article!!

    Now lets fold gentlemen (and ladies)!!!

    EDIT: I just received my 2nd Quad from Ebay!

  9. Watch out everyone... Ultra Nexus is gonna be flying past a bunch of people REAL fast!!!

  10. Is 2003 server required to run VMware?

    Hmm if each guest is going to use nearly 800mb of RAM I wont have much left with 2gb total to game or do much of anything else.

  11. No, you can use Windows XP as well, as long as it detects all 4 cores.

    Remember that this guide is for dedicated rigs, on your main rig (where you work, game, etc) its better to just use one SMP client.

    I´ll be experimenting more with the setup, and see if I can lower the RAM requirements without taking too much of a hit on performance.

  12. Could you not use imagecfg available here to set the affinity for the .exe that spawns the core processes? ie: set the original executable to only use core 0 + 1, start it and it should only spawn 2 processes. Maybe. Not tried it but it seems like it should work

  13. Unfortunatelly running two SMP clients together at the same time (which would create 8 fah_core instances) tend to EUE each other when you Control-C one of them.

    In theory, if you set 2 of each to each logical core, would work, but since you dont know which fa_core instance is from what SMP client, you could be mixing them up, and actually degrading overall processing.

    Try it out if you´d like and post back your results if different than my theory!

  14. unfortunately I don't have a quad core to test it on. That tool lets you set affinity on a .exe before you run it. so... set the affinity on one copy of the SMP client to 2 cores, run it, and in theory it should spawn 2 copies of the core process.

  15. Its still best to separate the 2 clients by some sort of OS or user account.

  16. Its still best to separate the 2 clients by some sort of OS or user account.

    Um...why? The standard (non-smp) client runs fine with multiple instances on the same OS as long as you keep the work directory seperate etc. Is there a specific issue with running more than one instance of the SMP client? (other than the 'spawning 4 processes' issue)

  17. Unfortunatelly running two SMP clients together at the same time (which would create 8 fah_core instances) tend to EUE each other when you Control-C one of them.

    SMP client is a whole different monster. Any previous client could be run from a pen drive, moved to any computer on any NT Windows OS and have no problems at all.

  18. Here is a pair of screenshots from the VM#1 running on core 1 and 2, on a p2653 WU

  19. Ahh, pretty pictures! Makes me happy!

  20. We´ll set your future quads up right when you get them.

  21. Here is another method that works.
    I chose this method because I know nothing about linux or VM. And this is not a dedicated machine, with both instances running on Vista I still have a little over a gig of free memory. ( Out of 2 )

    Could you not use imagecfg available here to set the affinity for the .exe that spawns the core processes? ie: set the original executable to only use core 0 + 1, start it and it should only spawn 2 processes. Maybe. Not tried it but it seems like it should work

    As noted the original .exe is not what uses the processor time, so setting the affinity on the executable does not work.
    So what I did to get around this was
    Set up 2 work directories.
    Set one as a service and run the other as the console. The only way I had any luck starting was to start the console first and then the service. ( From the service manager )
    Be sure to set a dfferent Machine ID for each client.
    Start the console client.
    Open Task Manager / Process tab and set the affinity on the exe. and the four spawned instances of the core to 0 and 1
    Start the service.
    Back to the task manager and check both exe.'s and all 8 cores and change the ones that are using all four cores to 3 and 4.
    Viola'

    To stop the console use Ctrl+ C and use the service manager to stop the service. ( I always copy my work folder, Que.dat and unitinfo to another folder before stopping. Just to be safe )

    I tried this same method without assigning affinity and went from 2850PPD down to about 2650PPD. With the affinity properly assigned it went up to 3150PPD on Project 2653 avg. frame times went from 8:52 ( one client ) to 16:18 ea. for 2 clients.

    It may not be quite as efficent as using a virtual machine , but for me it was much easier. Just thought I would share some options.

    Scott

  22. Scott, thanks. That's probably the route I'll take when I eventually start upgrading machines to quad. I am just not interested in Linux at this time. I'm sure it would be fun and rewarding. I just don't have time for another hobby.

  23. I am just not interested in Linux at this time. I'm sure it would be fun and rewarding. I just don't have time for another hobby.

    My thinking exactly.

    Scott

  24. You can try installing Windows guests instead of Linux x64 and see how they compare.

    I will try to build a small downloadable Linux 64 ISO prepared for F@H for this porpuse, for those not interested in getting into Linux much (or at all).

  25. Hey Ultra

    I did a quick search on " Windows Guest" and I am equally as confused as before.

    Maybe the "virtual" world is just outside of my grasp. Or maybe I am just stupid.

    Scott

  26. Trust me, is easier than you think... I´ve sent you a PM.

  27. Maybe the "virtual" world is just outside of my grasp. Or maybe I am just stupid.

    Not actually stupid, just virtually.

    Ooo, I just indicted myself as well.

  28. Update......Glitch with my method.

    It seems that after a WU finishes the four cores it was using close and four new ones are spawned. The new cores use all four processors and need there affinity reset. Makes for a lot of babysitting.

    It looks like I may have to figure out this VM stuff afterall.

    Scott

  29. Not actually stupid, just virtually.

    I don't know Leo , I just got a PM from Ultra offering to help me out. And wants me to use some other technology I have never tried....I think it is some sort of IM.

    Hey Ultra

    I will get my kid to help me set it up and let you know when I have joined this Century.

    Scott

  30. Any free versions of VMware?

    And is the Linux version faster than a windows SMP client?

    Edit: NM I see you can use anything.

    Started up an XP version of it, it showed my Abit boot manager, says 'booting from local disk', I clicked once and it locked up my system twice. Its not going so hot so far.

  31. Now see, we get into this Linux talk, and I'm confused. In the article, it's like it's written for those familiar with Linux. "Root" and "Guest"? Linux? Also, it talks about two Linux something or another with Windows? Huh? So, installing two instances of Linux something on top of Windows something? I am not making fun of anything, I just failed to follow the article past the first few paragraphs.

    Answer me this:

    Is that guide written for a system running Windows of Linux?
    If Windows, does VMWare install to it just like other programs install to it (no hocus pocus secret language on black screen, right?)
    The two F@H SMP clients to be used will be Linux SMP units?

  32. I need this spelled out plainly. Check me if I'm wrong. Given that I'm running WinXP, I will need:

    VMWare for Windows XP
    2 X SMP Folding Clients for Windows XP

    Correct?

  33. Any free versions of VMware?

    VMWare Server is free. IT doesn't have all the fancy snapshot abilities and such.

    http://www.vmware.com/download/server/

  34. Any free versions of VMware?

    And is the Linux version faster than a windows SMP client?

    Edit: NM I see you can use anything.

    Started up an XP version of it, it showed my Abit boot manager, says 'booting from local disk', I clicked once and it locked up my system twice. Its not going so hot so far.

    Like QCH said, VMware Server is free and have the basic functions required to do this task.

    Linux clients are more efficient, yes.

    Answer me this:

    Is that guide written for a system running Windows of Linux?
    If Windows, does VMWare install to it just like other programs install to it (no hocus pocus secret language on black screen, right?)
    The two F@H SMP clients to be used will be Linux SMP units?

    1.- Windows
    2.- Yes, VMware installs like any other app
    3.- Yes, 2 Linux SMP clients. One on each VM.

    I need this spelled out plainly. Check me if I'm wrong. Given that I'm running WinXP, I will need:

    VMWare for Windows XP
    2 X SMP Folding Clients for Windows XP

    Correct?

    1.- yes
    2.- no, again, Linux clients.

  35. Thank you

  36. I think I'll give it a try. I've got two clients of Win SMP FAH running great on this Windows box, but stats I've seen seem to indicate points production is better with two clients each under a separate, virtual machine, each client with two processor cores.

    Right now, I'm averaging about 3100ppd with two clients on WindowsXP on an overclocked Q6600 quad. I'm seeing claims for 4000ppd on VMWare. Is that bogus reporting?

  37. Oh, just thought of something else. I should have virtualization turned on for the CPU in the BIOS, right?

    Sorry, but still more questions. The only Linux FAH SMP download at Stanford is "Linux (x86-64 bit, only) SMP." I'm running WinXP 32 Bit. Now I'm confused again. (but then, that condition happens often)

  38. Oh, just thought of something else. I should have virtualization turned on for the CPU in the BIOS, right?

    Sorry, but still more questions. The only Linux FAH SMP download at Stanford is "Linux (x86-64 bit, only) SMP." I'm running WinXP 32 Bit. Now I'm confused again. (but then, that condition happens often)

    Yes, enable the Intel Virtualization technology in your BIOS.

    And there is no problem with having a 32bit host (windows in this case) and 64bit guests (Linux) as long as your processor supports it, VMware does its thing "bypassing" Windows platform.

    Do you have experience installing a Linux distro?

    I'm seeing claims for 4000ppd on VMWare. Is that bogus reporting?

    Yes, when you have 2 P2605 or P2653 (worth 1760points) you can get from 3900 to 4200PPD depending on RAM speed, CAS settings and other fine tunning settings.

  39. Yes, the processor is a Q6600, which has native 64bit processing capability and virtualization (which I will turn on).

    No, I have zero Linux experience. But why? I'm only installing VMWare and 2 X F@H SMP for Linux clients.

  40. Because you will be running Linux inside each VMWare instance

  41. Any one knows if Icrontic has a FTP site? I could upload a Linux image preset for folding that could help out those who never dealed with Linux before...

    The image weights 450Mb.

  42. We could really use a step by step instructions, for noobs like Leo and I who have like never used Linux before. Instructions like if you make a click you specify it. I can dl at 1mb plus so I dont care about the size if you tell us how to set it up.

  43. I´ll see what I can do... If I could pass you on this preset image, it would save you a big hassle of installing and optimizing Linux. I am not a Linux guru anyway so my knowledge is minimun in order to get the client working.

  44. Sent ya a PM for a my FTP info. Once its up one of us can pass along the info to you Leo.

  45. Good deal! So, bottom line: My quad core can produce more under two sort-off Linux installations in Windows than two Windows clients in Windows?

  46. I believe so... the result would be just as if you had 2 E6600 computers running Linux SMPs all the time... thats what a VMware over Windows, with 2 Virtual Machines, with Linux on each emulates.

    But beaware that this method is best for dedicated Quads. This is because I have tested that the Clients over the VMs will not completely "release" resources when the Host (windows) requires them (i.e playing games, encoding, etc).

    The result is that both VMs run at half performance and same for the host.

  47. Ok, I still want to try it, just not on my primary computer. No. 2 in signature is next for an upgrade. I've got all the parts except the Q6600, which is ordered. I'm just waiting on the dimbwit on eBay to get his act together. He's honest, just a dumb%^&.

  48. Leo, YGPM.

    I think I might try installing Windows on one instance and the Linux image on the other for a direct comparison unless Leo gets to it first.

  49. No, I won't get to it first. I'm not all that adventurous when it comes to major software experiments. I just installed UltraVNC a few days ago. I'm still riding a big ego boost from that 'major accomplishment.'

    Mmonnin, I PMed you back. I can't locate the FTP even though you gave me the address.

  50. I found this over at the pond:
    http://forums.pcper.com/showthread.php?t=446782

    I also found something else that would not require 2 VMware sessions but still keep the same PPD.

    I added a second client today for testing as a pre-read. I will report back after I take the next step and see how much I gain in PPD. I've heard as much as 1000PPD from 2 clients to this other setup, which should be close to using VMware.

  51. Great guide! Well summed and great link to preset VM linux images! Thanks!

  52. Looking forward to your report, Mmonnin!

    I've heard as much as 1000PPD from 2 clients to this other setup

    If that's accurate, my No.1 rig would be pulling 4150ppd on the 2653 work unit.

    BTW, all my parts are here for upgrading No. 4 this weekend to quad.

  53. I have only seen that high PPD with a bit more of an OC. My quad is set only to 3Ghz for now. I'll bump it up after this test. So far I am 1 client was running at 2800PPD at 9:03 per frame. With 2 clients running they are at 17:02 and 19:03 for 1488 and 1330 PPD on each client. So roughly about the same PPD with 1 vs 2 clients.

    My C2D runs at about 1800-1900 PPD at 3.16 so about 1700-1800 each client on the C2Q would be reasonable for 3GHz. So 500-600PPD increase with VMware or this setup.

    And its kinda nice now that Fahmon isnt *hung* for any other computer on the network besides the one its installed on...

  54. Attached is my current fahmon for the VMs on Quad #1 and Quad #2

    Both are doing 4000PPD+ on the P2605s. The only diff between the two is that Quad #2 has 4-4-4-12 timmings where Quad #1 has 5-5-5-15.

    Both are @ 3Ghz.

    EDIT: I wonder how much a Yorkfield would do @ 4Ghz

  55. Wow that much difference just with timings!!!

    So each Quad instance is on par with my C2D system. Instant 800PPD at 3GHz with no VMware instances. I may have to try VMware anyway just to see if I can get up to your 2000+ PPD at 3GHz.

    http://distributed.org.ua/forum/inde...showtopic=1149

    Its an automatic infinity changer for FAH SMP. It will automatically set the afinity for each fahcore.exe that is running every 10 min. Install it, go to services.msc and tell it to run (or restart) and I gained 800PPD since this morning when I posted. !!!!! Screenies coming in a sec.

  56. A before and after installing of the FAHSMP Affinity CHanger. Thats all I did. Nothing else. 2 clients under WinXP before and after.

    C2Q was just 1 client running on all 4 cores, but was essentially about the same PPD as 2 clients. I posted that info above.
    Before:

    Check out C2Q1 and C2Q2 running at 3GHz. C2D is running at 3.16GHz. And oh my stats as well!!
    After:

  57. Each quad instance is now out performing my C2D by about 100PPD now. I did install it to my C2D syste, but recieved minimal results. Up to 6770PPD with those same machines.

    Just because of this, being able to run 2 clients w/o VMware, I may just upgrade by desktop to a quad.

    Oh crap...I forgot to post this from this morning. But yeah uber!

  58. I am all over this. Getting ready to install that new trick.

    BTW, as I write this, I'm installing Windows on system No. 4. New Q6600 and Abit IP35-E is humming away, just waiting for the OS and drivers...and 2 X WinSMP FAH!

  59. Alright! The Affinity Changer is running on System No. 1 now. Of course, it will be a day or so before I can verify results.

    Bonus! System No. 4 is now a quad. 2 X FAH WinSMP with Affinity Changer are now running on it as well.

  60. w00t!!!! I moved to 2 clients at night, got the 1300/1400+ readings, installed in the morning before work and was up 800PPD later that night.

    Both C2Q clients are sitting 40-50PPD short of 2000 while my C2D at 160MHz more is just over 1800MHz. Pretty sure the timings are pretty relaxed on the C2D but thats for another day. I dont wont to end up like Keebs and making a post like "Im F'ing wasted" or some **** like that even though I am!!!! I lub U keeber!!! ROFL

  61. It sounds nice- but I had this thought-

    Would you recommend it only for a dedicated folding box, like Nex recommend with the VMWare?

    You see- I'm not so sure about putting a 3rd-party (from Russia, I think) utillity in a box to run an application I turn over my Admin password to.

    I'm sorry, I know that's paranoid- but I have reservations about putting it on a main rig.

  62. There is no password required. You use your password to install FAHSMP so whats the difference.

    Dedicated or not, it doesnt make a difference. Sure if you use your cores for something else you will get less of an improvement since other things are taking CPU power. But this is better than running VMware instaces on your main and more PPD than 1 or 2 clients.

  63. Some preliminary results. Yesterday I put together system No. 4 in signature. I don't yet have a high overclcock on it, as it's a new, untested system. The OC right now is 600MHz, 2.4 -> 3.0GHz. FAHMon shows it to be outproducing what my other quad was producing by 700PPD. The other quad, No.1, does now have Affinity Changer installed and running, but FAHMon is not yet reporting the new performance rate, rather the old rate, which is the average performance before Affinity Changer was installed.

    from Russia, I think

    Close, but no cigar. It's from the Ukraine.

  64. FAHMon doesnt show an increase right away, it must use some type of average to get PPD. I played a game on my C2D, the PPD went down to 1300PPD from 1800PPD. It gradually went back up to 1800PPD I.E. I saw scores in 1400, 1500 PPD range.

  65. Correct. FAHMon shows a much higher PPD on my new quad because there is no 'slower' pre-Affinity data. The other quad system gets all the pre-Affinity data included in factoring the PPD.

  66. Amazing! Compare the red boxed area to the green boxed area. Both boxed areas represent a Q6600 with two FAH SMP clients processing work units. In both cases the work units are 2653. The red box reflects the increased Folding efficiency with Affinity Changer running. The green box has Affinity Changer running as well, but is still showing points per day production with previous production rates factored in. Granted, the computer represented by red is overclocked higher. Nevertheless, we are looking at about an 800PPD improvement with Affinity Changer running as a service. (The greater overclock of red over green is probably worth about 200PPD.) This is huge, simply huge.

  67. This is huge, simply huge.

    Thats what SHE said!

    I just had to, it was wide open.

  68. Got my interest here, so two instances of the SMP client running on a Quad under windows will produce more pointy things running the optimizer.??? yeah not my usual schpeill I know Leo, but it will help My surge to the top 100 actives on team 93 so asking??

  69. It just takes the place of having to use VMware, and its more efficient than running 2 clients alone, thats it. I'm not comparing Windows vs Linux.

  70. Spike, I'd like to echo what Mmonnin stated. I am not in a position to compare VMWare/Windows/Whatever, but I can say that based on production data so far, Affinity Changer boosts production on my Windows XP boxes by 700+ PPD. Conditions are two Windows FAH SMP clients running on the effected computers. Sorry, don't know about VMWare. I won't try it, now that Affinity Changer is working its magic.

  71. I might try it to compare Linux vs Windows. I found another guide on the pond as well for that.

Troll-free since 2003 ®