Silly Newbie Tricks - Suspending a work unit

Message boards : Number crunching : Silly Newbie Tricks - Suspending a work unit

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 23983 - Posted: 21 Aug 2006, 1:13:18 UTC

They make things "idiot proof" so that only an idiot can foul them up...

I know the answer is obvious, but my real-world experiences doesn't seem to match up.

My laptop is up and running Rosetta now, and I'm able to let it run maybe 6-8 hours a day during the week, before powering down.

I am running small work units (2 hours), so I'm not losing a great deal of crunching, but an hour here and an hour there add up.

I tried three different methods, and all failed to allow me to continue with a work unit when powering back up. Rather, the work unit starts from scratch.

(1) Using "Suspend" button from "Tasks" tab.
(2) Using "Suspend" button from "projects" tab.
(3) Just shutting down Win-doze from the "Start" button.

I "assume" any of these three should have allowed me to "Resume".

How often are checkpoints created? If every hour, then I guess its possible that it would have to start from beginning.

I know this is an eye-dee-ten-tea ("ID10T") enduser-error, so any directions to the path of enlightenment will be sincerely appreciated!
ID: 23983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alan Roberts

Send message
Joined: 7 Jun 06
Posts: 61
Credit: 6,901,926
RAC: 0
Message 23992 - Posted: 21 Aug 2006, 2:12:17 UTC

Here's what I do with my laptop:

  • Change your General Preferences so that Leave applications in memory while preempted? is set to Yes
  • When shutting down my laptop for a short period of time I Stand by (aka suspend-to-RAM, or whatever its called in your laptop)
  • When shutting down the laptop for a long period of time I Hibernate (suspend to disk)


You maintain forward progress this way, without falling back to the previous checkpoint.

If I need a low-priority reboot, I'll set the project for No new tasks and wait for it to complete the WU. On a higher-priority reboot I'll glance at the graphics and try to reach a model boundary.

I'm new at this, but I don't think you should be restarting a WU from scratch, even without the leave-in-memory setting described above. I think you should restart from the previous checkpoint. I believe Rosetta checkpoints on every model boundary, not on a clock-time basis.


ID: 23992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 23997 - Posted: 21 Aug 2006, 2:52:16 UTC

If it's restarting the WUs, it may not have reached a checkpoint. Yes, upon completion of a model, a checkpoint is made. And checkpoints may be made within a model as well. It varies by work unit as to where checkpoints in mid-model are possible. Some take more then an hour to reach a point in the calculations where it is possible to take a checkpoint. So, on such a protein, if you were only 50 minutes in to the first model, and turn off your machine, when you restart, you will have to start from the beginning. This should be fairly rare if you're machine is on for the 6-8hrs at a time you describe.

My understanding is that you will see a bump in the % complete when a mid-model checkpoint has been made. These checkpoints are the fractional portion of the % complete.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 23997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 23998 - Posted: 21 Aug 2006, 2:57:32 UTC

With BOINC 4.x I could suspend the project then exit BOINC and a checkpoint would be forced so that on restart the current model would begin where it left off. On BOINC 5.4.9 I've noticed that this doesn't always work. Seems to be a flakey bug in it.
ID: 23998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scott14o

Send message
Joined: 7 Apr 06
Posts: 24
Credit: 2,147,598
RAC: 0
Message 24002 - Posted: 21 Aug 2006, 3:49:37 UTC

I too, find it annoying that the check points are only after every model. My computer isn't the fastest so it sometimes takes awhile on each model, it would be nice to know that I can shut down my computer for the night and know that the hour and a half work that it had already done wasn't wasted.

Are there plans for there allow it to have more checkpoints?
ID: 24002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Avi

Send message
Joined: 2 Aug 06
Posts: 58
Credit: 95,619
RAC: 0
Message 24003 - Posted: 21 Aug 2006, 3:50:50 UTC - in response to Message 23998.  

With BOINC 4.x I could suspend the project then exit BOINC and a checkpoint would be forced so that on restart the current model would begin where it left off. On BOINC 5.4.9 I've noticed that this doesn't always work. Seems to be a flakey bug in it.

I recall reading that at certain points, there are above 300mb of data that needs to be stored.

When I shut my laptop, I usually hibernate. Then I have no fear of missing out from a checkpoint in rosetta, AND the laptop starts up much faster afterwards.
ID: 24003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 24112 - Posted: 21 Aug 2006, 16:15:58 UTC - in response to Message 23998.  

Keith
With BOINC 4.x I could suspend the project then exit BOINC and a checkpoint would be forced so that on restart the current model would begin where it left off. On BOINC 5.4.9 I've noticed that this doesn't always work. Seems to be a flakey bug in it.

I don't think you are correct on that. BOINC has no way to force applications to perform a checkpoint. Checkpointing (and the lack thereof) has been a problem for many BOINC projects.

Scott
Are there plans for there allow it to have more checkpoints?


Rosetta did add the mid-model checkpoints. And the team seems aware that additional checkpoints, especially for the larger proteins which take longer for each model, and to reach the mid-model checkpoints is desireable if possible.

They end up in a catch-22 situation where if they checkpoint too frequently, they are consuming your machine resources in performing the checkpoints, rather than doing the science. If they don't checkpoint frequently enough, they end up losing sometimes significant amounts of the science work that has been done. It's a fine line to walk.

The good news is that with the "watchdog" they've stuck a balance that introduces a failsafe mechanism that ends the WU for you if the combination of the type of WU and the relative speed of the machine or time it is taking to reach checkpoint is causing a specific WU not to make progress. This sort of puts a cap on environments and combinations that are losing significant crunch time and not reaching checkpoints where the work is preserved.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 24112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John McLeod VII
Avatar

Send message
Joined: 17 Sep 05
Posts: 108
Credit: 195,137
RAC: 0
Message 24196 - Posted: 21 Aug 2006, 22:43:00 UTC - in response to Message 24112.  

With BOINC 4.x I could suspend the project then exit BOINC and a checkpoint would be forced so that on restart the current model would begin where it left off. On BOINC 5.4.9 I've noticed that this doesn't always work. Seems to be a flakey bug in it.

I don't think you are correct on that. BOINC has no way to force applications to perform a checkpoint. Checkpointing (and the lack thereof) has been a problem for many BOINC projects.
[/quote]

This is correct. Whenever a project application wishes to checkpoint it asks the BOINC client if it is time yet. If it is time for a checkpoint, the project checkpoints, and if it is not, the project is not supposed to checkpoing. There are a couple of projects that ignore this CPDN checkpoints once every 5 to 60 minutes ignoring the checkpoint timer, and it is common for the first cut of checkpointing in Alpha level projects to miss this detail. The most recent one of these checkpointed about 5 times per second on a fast machine.


BOINC WIKI
ID: 24196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 24220 - Posted: 22 Aug 2006, 2:10:10 UTC

Hi John.

You might be able to tell if what i'm seeing is a Rosetta or Boinc problem.

I'm running Boinc alpha 5.5.13 because of the problems with the two Seti's,

I have just joined rosetta and i have my app's switching every 2hrs now

Seti premeepts O.K. but Rosetta has kept going, till it finishs the first

couple of W.U.'s today it got new work and it premeepted at about 2.5hr's

and there is nothing in the messages about it, Only that Seti has started.

Any ideas.



ID: 24220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 24223 - Posted: 22 Aug 2006, 2:37:56 UTC

Hey Peter, welcome to Rosetta!

I've heard something about a feature coming soon in BOINC where it would preempt at checkpoints. Perhaps that's in your alpha version? So, after 2hrs of crunching Rosetta, it tapped that Rosetta WU on the shoulder and said "hey we'd like you to pack up for a bit" and it took the WU another half hour to reach a checkpoint, at which time BOINC rescheduled the CPUs. Does that sound like what happened?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 24223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 24234 - Posted: 22 Aug 2006, 3:47:48 UTC

Hi Feet1st.

Possable but i just came home and now Seti has keept running to

it's over 2.5hrs and still going, I geuss till it finish's! I will see

what happens might have to go back to 5.4.9.

ID: 24234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 24242 - Posted: 22 Aug 2006, 7:50:24 UTC - in response to Message 24234.  

Hi Feet1st.

Possable but i just came home and now Seti has keept running to

it's over 2.5hrs and still going, I geuss till it finish's! I will see

what happens might have to go back to 5.4.9.


Finally! I think this is a very good feature. I was tired to see BOINC reschedule when the old WU was almost done. I think it is much better to finish the current WU when it's near completion and than to reschedule than to reschedule every 2 hrs no matter whether there was a recent checkpoint or the WU was almost done. Don't you agree?
ID: 24242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 24281 - Posted: 22 Aug 2006, 15:05:58 UTC

Yes, let the rest of the debt system etc. work out the details down the road. My understanding is it waits until a checkpoint is reached. So, may not be a completed model or completed WU... but means that no work is lost, even if you aren't keeping in memory or turn off the machine!

Simple way to extract maybe 5% more useful work out of the existing machines. Depends up often you end BOINC or were losing work that hadn't been checkpointed.

If anyone has a link to the details of this upcoming BOINC feature, please post a link.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 24281 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John McLeod VII
Avatar

Send message
Joined: 17 Sep 05
Posts: 108
Credit: 195,137
RAC: 0
Message 24320 - Posted: 23 Aug 2006, 0:21:28 UTC - in response to Message 24281.  

Yes, let the rest of the debt system etc. work out the details down the road. My understanding is it waits until a checkpoint is reached. So, may not be a completed model or completed WU... but means that no work is lost, even if you aren't keeping in memory or turn off the machine!

Simple way to extract maybe 5% more useful work out of the existing machines. Depends up often you end BOINC or were losing work that hadn't been checkpointed.

If anyone has a link to the details of this upcoming BOINC feature, please post a link.

The 5.5 CPU scheduler waits for the next checkpoint later than 10 seconds before the check (there is some asynchronous code, and several seconds can disappear if the host is slow and busy) unless there is a task the needs extra CPU time to complete on time. This may suspend a task just a few seconds before it is complete if there is a checkpoint there, but normally a checkpoint will only happen once every few minutes. Problems that had to be dealt with: tasks that run for days without checkpointing (there are projects that do this), projects that lie about how much work is left (one project I remember had tasks that had a 100 hours or so of CPU time after 100% complete was reached on some tasks).

5.5.13 also implements work fetch that does not fetch a full queue from each project, and keeps the queue full even if there is a risk of late work. The user has indicated that the CPU would probably be idle if there was not enough work to keep it busy.


BOINC WIKI
ID: 24320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,465,996
RAC: 6,049
Message 47429 - Posted: 6 Oct 2007, 1:20:42 UTC - in response to Message 24320.  
Last modified: 6 Oct 2007, 1:21:17 UTC

[quote]
[snip]
The 5.5 CPU scheduler waits for the next checkpoint later than 10 seconds before the check (there is some asynchronous code, and several seconds can disappear if the host is slow and busy) unless there is a task the needs extra CPU time to complete on time. This may suspend a task just a few seconds before it is complete if there is a checkpoint there, but normally a checkpoint will only happen once every few minutes. Problems that had to be dealt with: tasks that run for days without checkpointing (there are projects that do this), projects that lie about how much work is left (one project I remember had tasks that had a 100 hours or so of CPU time after 100% complete was reached on some tasks).

[snip]


Is rosetta@home one of these? This morning, after about 5 hours, the boincmgr indicated that rosetta@home reached 100% complete, yet it has been running about 10 hours since then. And really running, not stalled. I am running 5.8.16 of the BOINC client and boincmgr. rosetta_5.69_i686-pc-linux-gnu is the program itself.
This is a Red Hat Enterprise Linux 5 system with two 3.06 GHz hyperthreaded Xeon processors and 8 GBytes RAM.

$ ps -fu boinc
UID PID PPID C STIME TTY TIME CMD
boinc 2420 4627 86 03:52 ? 15:04:04 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2421 2420 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2422 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2423 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 4627 4625 0 Sep29 ? 00:11:16 /home/boinc/BOINC/boinc



ID: 47429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 47432 - Posted: 6 Oct 2007, 4:34:30 UTC - in response to Message 47429.  

[quote]
[snip]
The 5.5 CPU scheduler waits for the next checkpoint later than 10 seconds before the check (there is some asynchronous code, and several seconds can disappear if the host is slow and busy) unless there is a task the needs extra CPU time to complete on time. This may suspend a task just a few seconds before it is complete if there is a checkpoint there, but normally a checkpoint will only happen once every few minutes. Problems that had to be dealt with: tasks that run for days without checkpointing (there are projects that do this), projects that lie about how much work is left (one project I remember had tasks that had a 100 hours or so of CPU time after 100% complete was reached on some tasks).

[snip]


Is rosetta@home one of these? This morning, after about 5 hours, the boincmgr indicated that rosetta@home reached 100% complete, yet it has been running about 10 hours since then. And really running, not stalled. I am running 5.8.16 of the BOINC client and boincmgr. rosetta_5.69_i686-pc-linux-gnu is the program itself.
This is a Red Hat Enterprise Linux 5 system with two 3.06 GHz hyperthreaded Xeon processors and 8 GBytes RAM.

$ ps -fu boinc
UID PID PPID C STIME TTY TIME CMD
boinc 2420 4627 86 03:52 ? 15:04:04 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2421 2420 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2422 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2423 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 4627 4625 0 Sep29 ? 00:11:16 /home/boinc/BOINC/boinc




I assume you are asking about the comment I've bolded?

...not to my knowledge. I believe the odd symptoms people are seeing on Linux all relate to tasks which show they are not yet completed, but BOINC has requested that they stop crunching and it has scheduled another task, but the Rosetta thread continues working... working what would otherwise be normally. As in it will finish at a normal time... just that it shouldn't still be running.

Rosetta Moderator: Mod.Sense
ID: 47432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,465,996
RAC: 6,049
Message 47444 - Posted: 6 Oct 2007, 10:53:56 UTC - in response to Message 47432.  

[quote]
[snip]
The 5.5 CPU scheduler waits for the next checkpoint later than 10 seconds before the check (there is some asynchronous code, and several seconds can disappear if the host is slow and busy) unless there is a task the needs extra CPU time to complete on time. This may suspend a task just a few seconds before it is complete if there is a checkpoint there, but normally a checkpoint will only happen once every few minutes. Problems that had to be dealt with: tasks that run for days without checkpointing (there are projects that do this), projects that lie about how much work is left (one project I remember had tasks that had a 100 hours or so of CPU time after 100% complete was reached on some tasks).

[snip]


Is rosetta@home one of these? This morning, after about 5 hours, the boincmgr indicated that rosetta@home reached 100% complete, yet it has been running about 10 hours since then. And really running, not stalled. I am running 5.8.16 of the BOINC client and boincmgr. rosetta_5.69_i686-pc-linux-gnu is the program itself.
This is a Red Hat Enterprise Linux 5 system with two 3.06 GHz hyperthreaded Xeon processors and 8 GBytes RAM.

$ ps -fu boinc
UID PID PPID C STIME TTY TIME CMD
boinc 2420 4627 86 03:52 ? 15:04:04 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2421 2420 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2422 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 2423 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c
boinc 4627 4625 0 Sep29 ? 00:11:16 /home/boinc/BOINC/boinc




I assume you are asking about the comment I've bolded?

...not to my knowledge. I believe the odd symptoms people are seeing on Linux all relate to tasks which show they are not yet completed, but BOINC has requested that they stop crunching and it has scheduled another task, but the Rosetta thread continues working... working what would otherwise be normally. As in it will finish at a normal time... just that it shouldn't still be running.


You assume correctly. Most rosetta work units seem to complete in 5 to 8 hours for me. This one announced it was 100% complete and had no time remaining at about 5 hours, but it has now run up 22 hours 17 minutes. According to "top" command, it has consumed 1338:07 (minutes:seconds) time.

If I knew it was running something important, I would just let it run, but most of this time has run up after boincmgr announced the process was complete.

Also I do not understand the excess rosetta processes.

PID PPID USER PR NI S VIRT RES SHR SWAP %MEM %CPU TIME+ P COMMAND
2420 4627 boinc 39 19 R 56500 45m 20 9632 0.6 74 1342:07 0 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1
4629 4627 boinc 34 19 S 35760 5900 3148 29m 0.1 0 1:07.95 0 hadcm3trans_5.41_i686-pc-linux-gnu hadcm3inct_cmus_1920_160_65869824 1085_ocean.year yafbg
2421 2420 boinc 34 19 S 56500 45m 20 9632 0.6 0 0:00.13 2 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1
2422 2421 boinc 34 19 S 56500 45m 20 9632 0.6 0 0:00.51 1 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1
2423 2421 boinc 35 19 S 56500 45m 20 9632 0.6 0 0:00.04 2 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1

ID: 47444 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 47457 - Posted: 6 Oct 2007, 19:07:16 UTC

I guess I would suggest ending BOINC and restarting.

The "excess" processes could be due to BOINC going to a "waiting for memory" state. It then starts up another process and crunches on that until memory again cross above your preference.

I see you have 4 cores and 8GB of memory. Do your BOINC General Preferences allow it to use at least 25% of that? For both idle and while active?
Rosetta Moderator: Mod.Sense
ID: 47457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,465,996
RAC: 6,049
Message 47458 - Posted: 6 Oct 2007, 20:31:18 UTC - in response to Message 47457.  

I guess I would suggest ending BOINC and restarting.

The "excess" processes could be due to BOINC going to a "waiting for memory" state. It then starts up another process and crunches on that until memory again cross above your preference.

I see you have 4 cores and 8GB of memory. Do your BOINC General Preferences allow it to use at least 25% of that? For both idle and while active?


I do not see why my machine would have any trouble getting memory for a BOINC application. I have 8 GBytes RAM and allow 75% of it to BOINC when the machine is busy (whatever that means) and 95% when the machine is not busy. Typically, 75% of the RAM is devoted to the input cache, although that can go down somewhat when I run a postgreSQL database application.

I tried stopping BOINC and everything stopped except for the rosetta programs that kept running. The one with all the time on it was the parent of the other three.

I killed them and restarted BOINC and all seems to be running normally. I assume I lost 30 hours credit for that mess.
ID: 47458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,465,996
RAC: 6,049
Message 47582 - Posted: 10 Oct 2007, 2:19:19 UTC - in response to Message 47458.  

I guess I would suggest ending BOINC and restarting.

The "excess" processes could be due to BOINC going to a "waiting for memory" state. It then starts up another process and crunches on that until memory again cross above your preference.

I see you have 4 cores and 8GB of memory. Do your BOINC General Preferences allow it to use at least 25% of that? For both idle and while active?


I do not see why my machine would have any trouble getting memory for a BOINC application. I have 8 GBytes RAM and allow 75% of it to BOINC when the machine is busy (whatever that means) and 95% when the machine is not busy. Typically, 75% of the RAM is devoted to the input cache, although that can go down somewhat when I run a postgreSQL database application.

I tried stopping BOINC and everything stopped except for the rosetta programs that kept running. The one with all the time on it was the parent of the other three.

I killed them and restarted BOINC and all seems to be running normally. I assume I lost 30 hours credit for that mess.


Progress report, sort-of. I probably did not lose any credit, at least as yet. After the boinc client scheduler got around to it, it resumed that 100% progress work unit again and it ran quite a few hours more. Then it started another part of the same work unit (same line in boincmgr), reset the time run to 0, but still indicating 100% progress with no time remaining. Since then it has run up more than 37 hours. I propose to let it run another day or so and see what happens.

ID: 47582 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Silly Newbie Tricks - Suspending a work unit



©2024 University of Washington
https://www.bakerlab.org