Message boards : Number crunching : Does boinc hold on to cores?
Author | Message |
---|---|
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
I have a Ryzen7-2700x with 8 cores that is set up to run Rosetta@home at about 65% (to avoid overheating), when I'm not using the machine for other purposes. One of these uses a molecular dynamics simulator (LAMMPS) that employs MPI to distribute the work over the cores. After nothing but Rosetta@home was running overnight, I launched a LAMMPS application this morning. The boinc manager said that Rosetta@home was suspended, but the LAMMPS application only ran on one core, yet the system status monitor showed all other cores (and threads) as idle. I tried this repeatedly, with the same result. Finally I restarted the machine, and then the LAMMPS application ran on all 8 cores (actually all 16 threads). So my question is this: is there anything about boinc that would cause it to hang on to cores (or set some parameter) whereby other applications such as LAMMPS assumes the full machine is not available? Or does anyone know of a setting that I can 'reset' regarding the status of the machine for OpenMPI? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 1,429 |
I have a Ryzen7-2700x with 8 cores that is set up to run Rosetta@home at about 65% (to avoid overheating), when I'm not using the machine for other purposes. One of these uses a molecular dynamics simulator (LAMMPS) that employs MPI to distribute the work over the cores. After nothing but Rosetta@home was running overnight, I launched a LAMMPS application this morning. The boinc manager said that Rosetta@home was suspended, but the LAMMPS application only ran on one core, yet the system status monitor showed all other cores (and threads) as idle. I tried this repeatedly, with the same result. Finally I restarted the machine, and then the LAMMPS application ran on all 8 cores (actually all 16 threads). Have you tried shutting down BOINC, not just suspending it, rather than restarting the machine? |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Perhaps not so much hang on to cores as hang on to memory, which could be causing LAMMPS to think there is insufficient available for it to run on cores that BOINC was previously using. In Computing preferences, in the Memory section, try deselecting Leave non-GPU tasks in memory while suspended. |
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option? |
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
Good point, hadn't thought of that, but when I looked, that box was already unchecked. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option? In Linux yes there is a way to clear Boinc from memory without requiring a pc restart, but I don't know what it is anymore, but I don't know how without shutting down Boinc. I think the problem is the suspended tasks, they are locked into their 'slots' and until you get rid of the tasks, or physically stop Boinc they are still tied to that particular 'slot'. |
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
Two items: 1. I checked the status at the Rosetta site, and apparently there are 29 (!) tasks listed as running, even though there are only 16 cores, and boinc only shows 16 tasks. 2. There is some traffic a few years ago about the role of NVIDIA drivers. I just realized that I recently updated my NVIDIA driver, though I fail to see how this would make a difference, since Rosetta@home doesn't use GPUs. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Re 1 (phantom tasks): apparently this can happen when there are network problems, causing the server to think it’s sent tasks but the client never to receive them. It came up here recently; the consensus was not to worry about it, and just let the phantom tasks time out and get resent. Re the other issues: I don’t know enough about running BOINC on Linux to be able to offer any more suggestions… |
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
A third item: I am now watching the system monitor. Rosetta is now using about 10 GB memory (out of 32 GB). I then start LAMMPS; Rosetta is 'suspended', but the memory usage is still around 10 GB (LAMMPS uses a few hundred MB). I've now cycled this process several times, and memory usage is over 11 GB. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
A third item: The OS needs memory to do things incluidng puttin stuff on the screen and switch between tasks and other things we are doing. If you suspend Boinc I'm not sure it releases the memory of any tasks as long as the Boinc Manager is still running. That's a problem some of us run into when we suspend one project to run another for a short period of time ie a couple of days and then go back to the original project and resume the tasks. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
To stop/start/restart BOINC on a recent Debian or Ubuntu: sudo systemctl stop boinc-client sudo systemctl start boinc-client sudo systemctl restart boinc-client BOINC blog |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option? Shutting down the manager does nothing. The manager is just the GUI. You would want to stop the BOINC client (the service that does the scheduling and comms). BOINC will keep tasks in memory if the option is selected, even if they get swapped out or suspended. Setting the “keep tasks in memory” off should release the memory and cpu thread when they get suspended. BOINC blog |
BKFC Send message Joined: 21 Apr 20 Posts: 34 Credit: 3,160,585 RAC: 0 |
I had already selected that option, that is, unchecked the box. I've run R@H for several months with no trouble. The only thing that has changed at my end is that I updated the NVIDIA driver to 4.50. There are some posts that suggest the problem is related to this, but I'm not in a position to downgrade my NVIDIA driver. |
Message boards :
Number crunching :
Does boinc hold on to cores?
©2024 University of Washington
https://www.bakerlab.org