Task keeps restarting

Message boards : Number crunching : Task keeps restarting

To post messages, you must log in.

AuthorMessage
[DPC]NGS~killerog

Send message
Joined: 20 Mar 06
Posts: 3
Credit: 82,640
RAC: 0
Message 66999 - Posted: 25 Jul 2010, 10:42:41 UTC

Hello,

I recently started working on rosetta again, but since a few days I have the problem that no work gets done, for some reason the task crashes. I deleted a few tasks and then it works again for a while but it happened today again. Now I saw this in my logs:

25/07/2010 10:33:25 rosetta@home Computation for task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0 finished
25/07/2010 10:33:25 rosetta@home Starting rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0
25/07/2010 10:33:28 rosetta@home Starting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 10:33:32 rosetta@home Started upload of rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0_0
25/07/2010 10:33:36 Project communication failed: attempting access to reference site
25/07/2010 10:33:36 rosetta@home Temporarily failed upload of rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0_0: can't resolve hostname
25/07/2010 10:33:36 rosetta@home Backing off 1 min 0 sec on upload of rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0_0
25/07/2010 10:33:39 Internet access OK - project servers may be temporarily down.
25/07/2010 10:34:36 rosetta@home Started upload of rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0_0
25/07/2010 10:34:44 rosetta@home Finished upload of rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2182_0_0
25/07/2010 10:34:48 rosetta@home Sending scheduler request: To report completed tasks.
25/07/2010 10:34:48 rosetta@home Reporting 1 completed tasks, not requesting new tasks
25/07/2010 10:34:51 rosetta@home Scheduler request completed
25/07/2010 10:36:24 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 10:44:09 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 10:52:49 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 10:56:08 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:03:07 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:13:11 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:19:03 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:24:17 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:34:05 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:42:39 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 11:51:45 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 12:14:20 rosetta@home Restarting task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 using minirosetta version 214
25/07/2010 12:20:09 rosetta@home task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 suspended by user
25/07/2010 12:20:11 rosetta@home task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 resumed by user
25/07/2010 12:20:31 rosetta@home task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 suspended by user
25/07/2010 12:35:47 rosetta@home task rb_07_19_269_927_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21674_2180_0 resumed by user

I run Boinc manager 6.10.58 on a MSI Megabook with Windows 2k SP4.

Any hints would be appriciated.

Regards,

killerog.
ID: 66999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 67002 - Posted: 25 Jul 2010, 11:02:04 UTC

You may want to use the Number crunching forum in future as that is where error reports are normally posted.

You mentioned that your tasks are crashing. In what way are they crashing? Are they producing error reports? Are they just slowing down so much they seem to have stopped? Are they resetting themselves back to zero?

Unfortunately, as far as I can tell your logs don't show anything much; you had a brief communication problem with the server which is normal and you suspended and resumed one task twice.
ID: 67002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]NGS~killerog

Send message
Joined: 20 Mar 06
Posts: 3
Credit: 82,640
RAC: 0
Message 67003 - Posted: 25 Jul 2010, 11:08:50 UTC

I assume that it crashed as I didn't see the minirosetta_2.1 process anymore in my task manager when the boinc manager wasn't doing any tasks.

Could it be that the task can't be restarted because I work on a non-admin account? But then, why is it trying to restart in the first place? Very strange.

ID: 67003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 67004 - Posted: 25 Jul 2010, 11:21:47 UTC

I am still not clear on what the problem is. Can you please explain exactly what you observed at each stage to make you think a crash occurred?

Log entries for "restarting" are normal. Tasks will often pause for a while and then restart where they left off. Are you observing something different to the normal restart behaviour?
ID: 67004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67005 - Posted: 25 Jul 2010, 16:10:02 UTC

Moved thread to Number Crunching.
Rosetta Moderator: Mod.Sense
ID: 67005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67006 - Posted: 25 Jul 2010, 16:21:13 UTC

Odd that it seems to restart every 5 minutes.

So long as BOINC is running under the same user that installed it, the authority shouldn't be a problem. Also, you would normally see an error to that effect if it were.

I note your machine has only the project minimum of 512MB. I wonder if perhaps these tasks are using more memory then normal. I also note that the two you've aborted so far have the same task name.

If you still have a task that is causing problems, could you look at the memory used by the rosettamini thread in the task manager as it restarts? It will generally climb as the task gets established and then level off sometime... well sometime around 5 minutes into the run. How high is it going just before it restarts again? When it restarts, does it get a new thread ID and reset the CPU time to zero?

Don't change multiple things at once, because it just confuses cause and effect. But could you review and let us know how you have BOINC configured? Specifically the settings on the disk and memory usage? (go to the advanced view, click the "advanced" item on the pulldown menu, click preferences, then click the disk and memory usage tab)
Rosetta Moderator: Mod.Sense
ID: 67006 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]NGS~killerog

Send message
Joined: 20 Mar 06
Posts: 3
Credit: 82,640
RAC: 0
Message 67008 - Posted: 25 Jul 2010, 18:15:42 UTC

I aborted the last task, so I can't tell about that.

The problem is that both times it happened when I wasn't at home. I'll post more info when it happens again and I am there to notice it.

About the suspending, I did that to see if that might get the task to do something again.

Thanks for the help btw.
ID: 67008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Task keeps restarting



©2025 University of Washington
https://www.bakerlab.org