Client errors

Message boards : Number crunching : Client errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75203 - Posted: 7 Mar 2013, 6:20:59 UTC

Interesting. Will update NVIDIA drivers and BOINC and see what happens.
ID: 75203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75204 - Posted: 7 Mar 2013, 8:13:21 UTC - in response to Message 75203.  

Interesting. Will update NVIDIA drivers and BOINC and see what happens.

Do you have Windows or Linux? I think it would be interesting to update just BOINC and nVidia drivers leave in current version. It will show if this issue is caused by a combination of BOINC + some nVidia drivers or if it is just a BOINC bug.
ID: 75204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75206 - Posted: 7 Mar 2013, 13:15:53 UTC - in response to Message 75204.  

Interesting. Will update NVIDIA drivers and BOINC and see what happens.

Do you have Windows or Linux? I think it would be interesting to update just BOINC and nVidia drivers leave in current version. It will show if this issue is caused by a combination of BOINC + some nVidia drivers or if it is just a BOINC bug.


Windows. I updated both. And it works!
ID: 75206 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ETQuestor

Send message
Joined: 13 Nov 12
Posts: 8
Credit: 957,206
RAC: 0
Message 75210 - Posted: 7 Mar 2013, 23:33:54 UTC - in response to Message 75195.  

I'm running NVIDIA 310.32 on x86_64 linux (Fedora 18). I just updated BOINC from 7.0.36 -> 7.0.52 (x86_64 linux build) and had the first successful task completion on Rosetta in weeks. I'll keep an eye on it to see if that continues. Seems to be some kind of interaction between BOINC and the NVIDIA driver that is weirdly version dependent.



Just wanted to confirm that this combo of BOINC 7.0.52 (linux x86_64) and NVIDIA 310.32 (linux x86_64) is working great...almost a week of successful tasks.
ID: 75210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75214 - Posted: 11 Mar 2013, 0:06:17 UTC
Last modified: 11 Mar 2013, 0:07:08 UTC

I am still getting Client Error for all of my tasks.

Even on the new beta BOINC 7.0.54 x64, using beta nVidia 314.14 drivers, on Windows 8 Pro x64, my Rosetta CPU tasks still complete but end in status Client Error.

I have just turned on the task_debug flag, to see if I can get any more info...
But I don't think the issue is resolved for me yet.

Why would nVidia drivers be a part of the problem, when the tasks are Rosetta Mini (which only uses CPU right?) Did we ever figure out what we believed fixed it? And do we know for sure it is fixed?

For me, it appears to NOT be fixed :(
ID: 75214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75221 - Posted: 11 Mar 2013, 11:01:35 UTC - in response to Message 75214.  
Last modified: 11 Mar 2013, 11:09:00 UTC

I am still getting Client Error for all of my tasks.

Even on the new beta BOINC 7.0.54 x64, using beta nVidia 314.14 drivers, on Windows 8 Pro x64, my Rosetta CPU tasks still complete but end in status Client Error.

I have just turned on the task_debug flag, to see if I can get any more info...
But I don't think the issue is resolved for me yet.

Why would nVidia drivers be a part of the problem, when the tasks are Rosetta Mini (which only uses CPU right?) Did we ever figure out what we believed fixed it? And do we know for sure it is fixed?

For me, it appears to NOT be fixed :(


I am not sure it is fixed, but you went beyond the reported fix versions. You went to Boinc 7.0.54 not 7.0.52 and you went to Nvidia version 314.14 not 314.7, I am NOT sure there is a difference, but their could be.

Because we can't see the server logs which should be showing what the error is, we might never know what the real problem is. The units finish just fine, but when they go back to the Rosetta Servers there is a problem and no credits are granted for the work done.

IF you chose to downgrade Boinc just remember that every unit you currently have will be aborted, you can upgrade Boinc with no problems, but downgrading IS a problem.

I just saw this over on MilkyWay:
" Error on Reporting...

3/10/2013 5:32:50 PM | Milkyway@Home | Reporting 39 completed tasks
3/10/2013 5:32:50 PM | Milkyway@Home | Requesting new tasks for ATI
3/10/2013 5:32:56 PM | Milkyway@Home | [error] Can't parse workunit in scheduler reply: unexpected XML tag or syntax
3/10/2013 5:32:56 PM | Milkyway@Home | [error] No close tag in scheduler reply

It is a known problem with 7.0.54, you will need to go back to 7.0.52." Boinc not reporting properly could be affecting you too. Try rolling Boinc back to 7.0.52 and see if that helps. I would leave the Nvidia software alone at least initially.
ID: 75221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75222 - Posted: 11 Mar 2013, 14:27:13 UTC
Last modified: 11 Mar 2013, 14:30:06 UTC

Here are some logs where the result was "Client Error", using BOINC 7.0.54 x64 (which does have an XML parsing glitch in it but is probably not causing this issue), on Windows 8 x64, with nVidia 314.14 beta (which I believe is irrelevant).
Note: I took the liberty to split the details up "per task" to make it easier to see.
The first 4 tasks have completed (and resulted in "Client error" - why?), and the second 4 tasks are still in progress.
What is causing this "Client error"??

Task Details
3/11/2013 4:30:05 AM | rosetta@home | [task] result state=NEW for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from handle_scheduler_reply
3/11/2013 4:30:06 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from CS::update_results
3/11/2013 4:36:22 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from CS::update_results
3/11/2013 4:36:22 AM | rosetta@home | [task] task_state=EXECUTING for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from start
3/11/2013 4:36:22 AM | rosetta@home | Starting task H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 using minirosetta version 345 in slot 11
3/11/2013 5:57:35 AM | rosetta@home | [task] task_state=SUSPENDED for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from suspend
3/11/2013 6:21:03 AM | rosetta@home | [task] task_state=EXECUTING for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from unsuspend
3/11/2013 6:21:03 AM | rosetta@home | Resuming task H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 using minirosetta version 345 in slot 11
3/11/2013 8:23:04 AM | rosetta@home | [task] Process for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 exited, exit code 0, task state 1
3/11/2013 8:23:04 AM | rosetta@home | [task] task_state=EXITED for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from handle_exited_app
3/11/2013 8:23:04 AM | rosetta@home | Computation for task H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 finished
3/11/2013 8:23:04 AM | rosetta@home | [task] result state=FILES_UPLOADING for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from CS::app_finished
3/11/2013 8:23:25 AM | rosetta@home | [task] result state=FILES_UPLOADED for H3i-D2A2_3briA_ProteinInterfaceDesign_20130220_75400_293_0 from CS::update_results

Task Details
3/11/2013 4:30:05 AM | rosetta@home | [task] result state=NEW for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from handle_scheduler_reply
3/11/2013 4:30:06 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from CS::update_results
3/11/2013 4:57:34 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from CS::update_results
3/11/2013 4:57:34 AM | rosetta@home | [task] task_state=EXECUTING for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from start
3/11/2013 4:57:34 AM | rosetta@home | Starting task rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 using minirosetta version 345 in slot 14
3/11/2013 8:05:33 AM | rosetta@home | [task] Process for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 exited, exit code 0, task state 1
3/11/2013 8:05:33 AM | rosetta@home | [task] task_state=EXITED for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from handle_exited_app
3/11/2013 8:05:33 AM | rosetta@home | Computation for task rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 finished
3/11/2013 8:05:33 AM | rosetta@home | [task] result state=FILES_UPLOADING for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from CS::app_finished
3/11/2013 8:05:45 AM | rosetta@home | [task] result state=FILES_UPLOADED for rb_03_10_36692_70150__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76284_187_0 from CS::update_results

Task Details
3/11/2013 4:30:05 AM | rosetta@home | [task] result state=NEW for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from handle_scheduler_reply
3/11/2013 4:30:06 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from CS::update_results
3/11/2013 4:46:06 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from CS::update_results
3/11/2013 4:46:06 AM | rosetta@home | [task] task_state=EXECUTING for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from start
3/11/2013 4:46:06 AM | rosetta@home | Starting task rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 using minirosetta version 345 in slot 4
3/11/2013 5:57:35 AM | rosetta@home | [task] task_state=SUSPENDED for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from suspend
3/11/2013 6:21:07 AM | rosetta@home | [task] task_state=EXECUTING for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from unsuspend
3/11/2013 6:21:07 AM | rosetta@home | Resuming task rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 using minirosetta version 345 in slot 4
3/11/2013 8:30:44 AM | rosetta@home | [task] Process for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 exited, exit code 0, task state 1
3/11/2013 8:30:44 AM | rosetta@home | [task] task_state=EXITED for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from handle_exited_app
3/11/2013 8:30:44 AM | rosetta@home | Computation for task rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 finished
3/11/2013 8:30:44 AM | rosetta@home | [task] result state=FILES_UPLOADING for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from CS::app_finished
3/11/2013 8:30:56 AM | rosetta@home | [task] result state=FILES_UPLOADED for rb_03_10_36700_70558__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76285_125_0 from CS::update_results

Task Details
3/11/2013 4:30:05 AM | rosetta@home | [task] result state=NEW for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from handle_scheduler_reply
3/11/2013 4:30:06 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from CS::update_results
3/11/2013 4:36:19 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from CS::update_results
3/11/2013 4:36:19 AM | rosetta@home | [task] task_state=EXECUTING for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from start
3/11/2013 4:36:19 AM | rosetta@home | Starting task 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 using minirosetta version 345 in slot 2
3/11/2013 5:57:39 AM | rosetta@home | [task] task_state=SUSPENDED for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from suspend
3/11/2013 6:03:47 AM | rosetta@home | [task] task_state=EXECUTING for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from unsuspend
3/11/2013 6:03:47 AM | rosetta@home | Resuming task 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 using minirosetta version 345 in slot 2
3/11/2013 8:05:12 AM | rosetta@home | [task] Process for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 exited, exit code 0, task state 1
3/11/2013 8:05:12 AM | rosetta@home | [task] task_state=EXITED for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from handle_exited_app
3/11/2013 8:05:12 AM | rosetta@home | Computation for task 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 finished
3/11/2013 8:05:12 AM | rosetta@home | [task] result state=FILES_UPLOADING for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from CS::app_finished
3/11/2013 8:05:21 AM | rosetta@home | [task] result state=FILES_UPLOADED for 31.ha14_lp2_hb14_35_bet2_1_0002_abinitio_IGNORE_THE_REST_76204_1621_0 from CS::update_results

Task Details
3/11/2013 6:55:40 AM | rosetta@home | [task] result state=NEW for TLUM_15_S89H56D86Q13N17_1_rsmn_24097_FP2_1_30317_129900001_209650001_abinitio_SAVE_ALL_OUT_75013_152_2 from handle_scheduler_reply
3/11/2013 6:55:41 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for TLUM_15_S89H56D86Q13N17_1_rsmn_24097_FP2_1_30317_129900001_209650001_abinitio_SAVE_ALL_OUT_75013_152_2 from CS::update_results
3/11/2013 7:06:37 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for TLUM_15_S89H56D86Q13N17_1_rsmn_24097_FP2_1_30317_129900001_209650001_abinitio_SAVE_ALL_OUT_75013_152_2 from CS::update_results
3/11/2013 8:05:12 AM | rosetta@home | [task] task_state=EXECUTING for TLUM_15_S89H56D86Q13N17_1_rsmn_24097_FP2_1_30317_129900001_209650001_abinitio_SAVE_ALL_OUT_75013_152_2 from start
3/11/2013 8:05:12 AM | rosetta@home | Starting task TLUM_15_S89H56D86Q13N17_1_rsmn_24097_FP2_1_30317_129900001_209650001_abinitio_SAVE_ALL_OUT_75013_152_2 using minirosetta version 345 in slot 2

Task Details
3/11/2013 6:55:40 AM | rosetta@home | [task] result state=NEW for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from handle_scheduler_reply
3/11/2013 6:55:41 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from CS::update_results
3/11/2013 6:56:03 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from CS::update_results
3/11/2013 7:00:27 AM | rosetta@home | [task] task_state=EXECUTING for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from start
3/11/2013 7:00:27 AM | rosetta@home | Starting task cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 using minirosetta version 345 in slot 9
3/11/2013 8:20:09 AM | rosetta@home | [task] task_state=QUIT_PENDING for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from request_exit()
3/11/2013 8:20:10 AM | rosetta@home | [task] Process for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 exited, exit code 0, task state 8
3/11/2013 8:20:10 AM | rosetta@home | [task] task_state=UNINITIALIZED for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from handle_exited_app
3/11/2013 8:21:10 AM | rosetta@home | [task] task_state=EXECUTING for cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 from start
3/11/2013 8:21:10 AM | rosetta@home | Restarting task cys47__1234_abinitio_SAVE_ALL_OUT_74915_543_2 using minirosetta version 345 in slot 9

Task Details
3/11/2013 6:55:40 AM | rosetta@home | [task] result state=NEW for cys48__2297_abinitio_SAVE_ALL_OUT_74949_542_2 from handle_scheduler_reply
3/11/2013 6:55:41 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for cys48__2297_abinitio_SAVE_ALL_OUT_74949_542_2 from CS::update_results
3/11/2013 6:56:20 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for cys48__2297_abinitio_SAVE_ALL_OUT_74949_542_2 from CS::update_results
3/11/2013 8:05:33 AM | rosetta@home | [task] task_state=EXECUTING for cys48__2297_abinitio_SAVE_ALL_OUT_74949_542_2 from start
3/11/2013 8:05:33 AM | rosetta@home | Starting task cys48__2297_abinitio_SAVE_ALL_OUT_74949_542_2 using minirosetta version 345 in slot 14

Task Details
3/11/2013 8:01:36 AM | rosetta@home | [task] result state=NEW for H3i-D2A2_3ds2A_ProteinInterfaceDesign_20130220_75400_282_0 from handle_scheduler_reply
3/11/2013 8:01:37 AM | rosetta@home | [task] result state=FILES_DOWNLOADING for H3i-D2A2_3ds2A_ProteinInterfaceDesign_20130220_75400_282_0 from CS::update_results
3/11/2013 8:01:42 AM | rosetta@home | [task] result state=FILES_DOWNLOADED for H3i-D2A2_3ds2A_ProteinInterfaceDesign_20130220_75400_282_0 from CS::update_results
3/11/2013 8:23:04 AM | rosetta@home | [task] task_state=EXECUTING for H3i-D2A2_3ds2A_ProteinInterfaceDesign_20130220_75400_282_0 from start
3/11/2013 8:23:04 AM | rosetta@home | Starting task H3i-D2A2_3ds2A_ProteinInterfaceDesign_20130220_75400_282_0 using minirosetta version 345 in slot 11
ID: 75222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,262,530
RAC: 16,974
Message 75223 - Posted: 11 Mar 2013, 15:42:51 UTC

One of my team members try NV drivers ver 314.07 on one of the computers with 100% client error rate (https://boinc.bakerlab.org/rosetta/results.php?hostid=1603836)
It helps - now WUs validates OK.
He is NOT upgrade BOINC. Still use standart 7.0.28 version (what gives 100% errors with 306.х and 310.x NV drivers).

So seems no any "fix" in BOINC. Just something changed in NV drivers.
And I thinks this is not the final error fix. Just some changes associated with something else, allowing coincidentally bypass the bug for now.
In next drivers or in some configurations it will pop-up again.
ID: 75223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75227 - Posted: 12 Mar 2013, 11:23:48 UTC - in response to Message 75222.  

Here are some logs where the result was "Client Error", using BOINC 7.0.54 x64 (which does have an XML parsing glitch in it but is probably not causing this issue), on Windows 8 x64, with nVidia 314.14 beta (which I believe is irrelevant).
Note: I took the liberty to split the details up "per task" to make it easier to see.
The first 4 tasks have completed (and resulted in "Client error" - why?), and the second 4 tasks are still in progress.
What is causing this "Client error"??


Dr. David Anderson, the man who invented and is STILL the main programmer of Boinc, came over and looked at Rosetta and couldn't seem to fix the problem either, I am not sure any of us have the kind of access needed to REALLY find or fix the problem. That is NOT to say he gave up and walked away, it just means it doesn't appear to be an obvious fix. New version of Boinc come out all the time, he could be incorporating what he thinks will fix it into them. When/if that happens I am guessing Rosetta will post it far and wide.
ID: 75227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,262,530
RAC: 16,974
Message 75228 - Posted: 12 Mar 2013, 11:49:21 UTC
Last modified: 12 Mar 2013, 11:49:46 UTC

Another team member try nv 314.07. And it NOT help in his case: https://boinc.bakerlab.org/rosetta/results.php?hostid=1555324

Will try upgrade BOINC now.

P.S.
Main difference beetwin computes - GTX 6хх (Kepler) cards in first (314.07 drivers helps) and GTX 580 (Fermi) in second (not helps).
ID: 75228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75229 - Posted: 12 Mar 2013, 13:47:47 UTC - in response to Message 75227.  
Last modified: 12 Mar 2013, 13:49:23 UTC

Dr. David Anderson, the man who invented and is STILL the main programmer of Boinc, came over and looked at Rosetta and couldn't seem to fix the problem either, I am not sure any of us have the kind of access needed to REALLY find or fix the problem. That is NOT to say he gave up and walked away, it just means it doesn't appear to be an obvious fix. New version of Boinc come out all the time, he could be incorporating what he thinks will fix it into them. When/if that happens I am guessing Rosetta will post it far and wide.


I communicated with David Anderson. He said that the log messages coming from BOINC Manager do not indicate a problem with BOINC Manager/client. He suggested that, if Rosetta is the only project exhibiting this "Client error" behavior, then it is likely either a project setup problem or a server problem. He requested that, if the project administrator could not fix it on his/her own, that they should get in contact him.

I've sent a Private Message to David E K, suggesting that he works with David Anderson to fix this issue, but have not yet heard a reply.

We beta testers are finalizing testing on a version of BOINC that will soon be released publicly, which is why I was vocal on determining the cause of this issue.
If it's a BOINC Manager/client issue, then it should be fixed before release, but as I said, it appears to be a project/server (backend) issue.

Regardless, I just want it fixed.
David E K, can you please make progress fixing it?

Thanks,
Jacob
ID: 75229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 75230 - Posted: 12 Mar 2013, 17:05:05 UTC - in response to Message 75017.  

David

do you still plan to update the Scheduler?

And if so when do expect to do this?

Thanks

Rayburner

We haven't changed the scheduler in a long time so it's likely the driver update that broke things.

Can anyone confirm that ralph does not have this issue?

I'll ask people here to submit more test jobs to Ralph.

I don't know when I'll be able to update the server but hopefully next month.

thanks!


ID: 75230 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 75231 - Posted: 12 Mar 2013, 18:01:35 UTC - in response to Message 75017.  

David

do you still plan to update the Scheduler?

And if so when do expect to do this?

Thanks

Rayburner

We haven't changed the scheduler in a long time so it's likely the driver update that broke things.

Can anyone confirm that ralph does not have this issue?

I'll ask people here to submit more test jobs to Ralph.

I don't know when I'll be able to update the server but hopefully next month.

thanks!


ID: 75231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75232 - Posted: 13 Mar 2013, 11:57:06 UTC - in response to Message 75229.  

Dr. David Anderson, the man who invented and is STILL the main programmer of Boinc, came over and looked at Rosetta and couldn't seem to fix the problem either, I am not sure any of us have the kind of access needed to REALLY find or fix the problem. That is NOT to say he gave up and walked away, it just means it doesn't appear to be an obvious fix. New version of Boinc come out all the time, he could be incorporating what he thinks will fix it into them. When/if that happens I am guessing Rosetta will post it far and wide.


I communicated with David Anderson. He said that the log messages coming from BOINC Manager do not indicate a problem with BOINC Manager/client. He suggested that, if Rosetta is the only project exhibiting this "Client error" behavior, then it is likely either a project setup problem or a server problem. He requested that, if the project administrator could not fix it on his/her own, that they should get in contact him.

I've sent a Private Message to David E K, suggesting that he works with David Anderson to fix this issue, but have not yet heard a reply.

We beta testers are finalizing testing on a version of BOINC that will soon be released publicly, which is why I was vocal on determining the cause of this issue.
If it's a BOINC Manager/client issue, then it should be fixed before release, but as I said, it appears to be a project/server (backend) issue.

Regardless, I just want it fixed.
David E K, can you please make progress fixing it?

Thanks,
Jacob


Sorry I did not know that part of it, nor that you were a beta tester, I am now complete as I learned something new EARLY today, it is 7:45 AM for me.

I think the problem is a Rosetta problem too, it does NOT affect their Beta project, just Rosetta. Also since most projects heavily modify the Boinc Server software to meet their own needs it is an easy thing to blame. And a likely cause as other projects would be having problems too if it were a Boinc Server side problem, and they just aren't.

I also agree that it would be REALLY NICE if David E K would spend more time on the problem then a couple of spare minutes here or there. Spending a day, at least, trying to fix it should eliminate alot of the possible problem areas and leave less to check the next time around. Ten different people checking for a solution leads to ALOT of duplication of efforts! This problem has gone on for SOOOO long it is a wonder that some of us are STILL here crunching! I am only here for the stats, I left when it didn't work for me either, and only came back to reach a goal and am then moving on to a project that seems to care more for its volunteers!!
ID: 75232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75233 - Posted: 13 Mar 2013, 18:37:44 UTC

Updating to the newest NVIDIA drivers and to the latest BOINC version worked like a charm. It has worked on two different PCs previously affected by the bug.

Example:
https://boinc.bakerlab.org/rosetta/results.php?hostid=1603147

I can't overclock the GPU as hard as I could with the older drivers, but oh well.
ID: 75233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JuhaM

Send message
Joined: 2 Nov 07
Posts: 3
Credit: 2,740,103
RAC: 325
Message 75236 - Posted: 15 Mar 2013, 5:07:41 UTC

Now tasks seems to be successful, 9 tasks so far. Tasks 568702383, 568679408, 568575892, ...

Changes since 7th February, Kernel is changed from 3.5.0-23 to 3.5.0-25 and BOINC from 7.0.27 to 7.0.54.

Hardware and software info:

BOINC
- version 7.0.54
- BOINC installed from ppa:costamagnagianfranco/boinc development repository LocutusOfBorg BOINC repo

CPU
- Model: 21.1.2 "AMD FX(tm)-6100 Six-Core Processor

GPU
- NVidia GTX 460
- driver 304.51 ( from Ubuntu repository)

OS
- Ubuntu 12.10
- Kernel Linux 3.5.0-25-generic #39-Ubuntu SMP x86_64

RAM 16 GB
ID: 75236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75239 - Posted: 15 Mar 2013, 19:05:03 UTC - in response to Message 75236.  

Now tasks seems to be successful, 9 tasks so far. Tasks 568702383, 568679408, 568575892, ...

Changes since 7th February, Kernel is changed from 3.5.0-23 to 3.5.0-25 and BOINC from 7.0.27 to 7.0.54.

Hardware and software info:

BOINC
- version 7.0.54
- BOINC installed from ppa:costamagnagianfranco/boinc development repository LocutusOfBorg BOINC repo

CPU
- Model: 21.1.2 "AMD FX(tm)-6100 Six-Core Processor

GPU
- NVidia GTX 460
- driver 304.51 ( from Ubuntu repository)

OS
- Ubuntu 12.10
- Kernel Linux 3.5.0-25-generic #39-Ubuntu SMP x86_64

RAM 16 GB


Hmmm, seems that the new BOINC version fixed the bug, not the NVIDIA driver.
ID: 75239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75240 - Posted: 15 Mar 2013, 19:10:40 UTC
Last modified: 15 Mar 2013, 19:12:08 UTC

It's strange that it works for some, but not for me.
I am STILL getting Client error for my Rosetta@Home Rosetta Mini tasks.

Windows 8 Professional with Media Center x64
Intel Core i7 965 XE, Quad-core
GPU Device 0: nVidia GTX 660 Ti FTW 3GB
GPU Device 1: nVidia GTX 460 1GB
nVidia Drivers: 314.14 Beta
Boinc: 7.0.56 Beta x64

David E K, can you please make progress fixing it, contacting David Anderson if necessary?

My most recent task that resulted in Client error:
https://boinc.bakerlab.org/rosetta/result.php?resultid=568810261

- Jacob
ID: 75240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75241 - Posted: 16 Mar 2013, 1:51:46 UTC
Last modified: 16 Mar 2013, 2:16:02 UTC

I tested running only this project (with all my other projects suspended, so no GPUs were used, and I also restarted BOINC too, then got a Task, let it process). That did not work, I watched it complete/upload, and the result was: Client error.
I tested setting the Rosetta@Home Preference for Target CPU run time, changing it from "not selected" to "1 hour". Still with other projects suspended. Restarted BOINC, got the task, watched it complete/upload, and the result was: Client error.
I removed the project from BOINC, and have re-added it. I tested whether that will help at all, but it didn't; the result was: Client error.

One thing I'm noticing, though, and I think it's very relevant.. and has been mentioned before, is...
The tasks that have
Outcome: Client error
also have:
application version: ---


That application version is missing!
Is the scheduler request not being parsed/processed correctly?
Perhaps it gets tripped up when certain CUDA or OpenCL blocks or texts are included within it?


Below is a full scheduler request, where a result was uploaded, whose outcome was Client error. (I have blanked out the authenticator, for my protection)
Could someone inspect my scheduler request below, to see if anything looks wrong?

<scheduler_request>
<authenticator>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</authenticator>
<hostid>1527365</hostid>
<rpc_seqno>3519</rpc_seqno>
<core_client_major_version>7</core_client_major_version>
<core_client_minor_version>0</core_client_minor_version>
<core_client_release>56</core_client_release>
<resource_share_fraction>0.003311</resource_share_fraction>
<rrs_fraction>1.000000</rrs_fraction>
<prrs_fraction>1.000000</prrs_fraction>
<duration_correction_factor>0.859584</duration_correction_factor>
<allow_multiple_clients>0</allow_multiple_clients>
<sandbox>0</sandbox>
<work_req_seconds>0.000000</work_req_seconds>
<cpu_req_secs>0.000000</cpu_req_secs>
<cpu_req_instances>0.000000</cpu_req_instances>
<estimated_delay>0.000000</estimated_delay>
<client_cap_plan_class>1</client_cap_plan_class>
<platform_name>windows_x86_64</platform_name>
<alt_platform>
<name>windows_intelx86</name>
</alt_platform>
<code_sign_key>
1024
da94985671f399f2ccbb8711004a4d7b782f239babc54e4db341dd1c7b94fbf3
745d30084d332667546f400f5063e683c812a69a5d0945f53f0421961337e3f0
cfad19652eb4f50523473f92ee3b1f43d358a5ba911479e553f43c91b8a4939a
6aa5258107ef609a240bcffcfc9a19c8a8b0df99fdb9508694d499478fb0a931
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000010001
.
</code_sign_key>
<working_global_preferences>
<global_preferences>
<source_project>http://boincsimap.org/boincsimap/</source_project>
<mod_time>1359674284.000000</mod_time>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active>1</run_gpu_if_user_active>
<suspend_if_no_recent_input>0.000000</suspend_if_no_recent_input>
<suspend_cpu_usage>0.000000</suspend_cpu_usage>
<start_hour>0.000000</start_hour>
<end_hour>0.000000</end_hour>
<net_start_hour>0.000000</net_start_hour>
<net_end_hour>0.000000</net_end_hour>
<leave_apps_in_memory>1</leave_apps_in_memory>
<confirm_before_connecting>0</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<dont_verify_images>0</dont_verify_images>
<work_buf_min_days>0.010000</work_buf_min_days>
<work_buf_additional_days>0.000000</work_buf_additional_days>
<max_ncpus_pct>100.000000</max_ncpus_pct>
<cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
<disk_interval>60.000000</disk_interval>
<disk_max_used_gb>0.000000</disk_max_used_gb>
<disk_max_used_pct>0.000000</disk_max_used_pct>
<disk_min_free_gb>0.000000</disk_min_free_gb>
<vm_max_used_pct>75.000000</vm_max_used_pct>
<ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
<ram_max_used_idle_pct>50.000000</ram_max_used_idle_pct>
<idle_time_to_run>1.000000</idle_time_to_run>
<max_bytes_sec_up>65536.000000</max_bytes_sec_up>
<max_bytes_sec_down>0.000000</max_bytes_sec_down>
<cpu_usage_limit>100.000000</cpu_usage_limit>
<daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<override_file_present>1</override_file_present>
<network_wifi_only>1</network_wifi_only>
</global_preferences>
</working_global_preferences>
<global_preferences>
<source_project>http://boincsimap.org/boincsimap/</source_project>
<source_scheduler>http://boincsimap.org/boincsimap_cgi/cgi</source_scheduler>

<mod_time>1359674284</mod_time>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>0</run_if_user_active>
<run_gpu_if_user_active>0</run_gpu_if_user_active>
<idle_time_to_run>1</idle_time_to_run>
<suspend_if_no_recent_input>0</suspend_if_no_recent_input>
<suspend_cpu_usage>0</suspend_cpu_usage>
<leave_apps_in_memory>1</leave_apps_in_memory>
<cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
<max_cpus>0</max_cpus>
<max_ncpus_pct>0</max_ncpus_pct>
<cpu_usage_limit>0</cpu_usage_limit>
<disk_max_used_gb>0</disk_max_used_gb>
<disk_min_free_gb>0.001</disk_min_free_gb>
<disk_max_used_pct>0</disk_max_used_pct>
<disk_interval>60</disk_interval>
<vm_max_used_pct>75</vm_max_used_pct>
<ram_max_used_busy_pct>50</ram_max_used_busy_pct>
<ram_max_used_idle_pct>50</ram_max_used_idle_pct>
<work_buf_min_days>0.1</work_buf_min_days>
<work_buf_additional_days>0.1</work_buf_additional_days>
<confirm_before_connecting>0</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<max_bytes_sec_down>0</max_bytes_sec_down>
<max_bytes_sec_up>64000</max_bytes_sec_up>
<daily_xfer_limit_mb>0</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<dont_verify_images>0</dont_verify_images>
</global_preferences>
<global_prefs_source_email_hash>2b494cb8c093ddb9e1feb73a3fa6fe20</global_prefs_source_email_hash>
<cross_project_id>67264dbbbc1de9e0827409efb9c4da1f</cross_project_id>
<time_stats>
<on_frac>0.970262</on_frac>
<connected_frac>0.946004</connected_frac>
<cpu_and_network_available_frac>0.937871</cpu_and_network_available_frac>
<active_frac>0.938944</active_frac>
<gpu_active_frac>0.934463</gpu_active_frac>
<client_start_time>1363380199.308359</client_start_time>
<previous_uptime>14983.386038</previous_uptime>
<now>1363395182.694397</now>
</time_stats>
<net_stats>
<bwup>31179.459994</bwup>
<avg_up>202836358.092599</avg_up>
<avg_time_up>1363395178.034811</avg_time_up>
<bwdown>186702.013829</bwdown>
<avg_down>298098215.947365</avg_down>
<avg_time_down>1363391423.880024</avg_time_down>
</net_stats>
<host_info>
<timezone>-14400</timezone>
<domain_name>RacerX</domain_name>
<ip_addr>192.168.2.102</ip_addr>
<host_cpid>ad78f9caf4e3caa8f466201d4c6becc4</host_cpid>
<p_ncpus>8</p_ncpus>
<p_vendor>GenuineIntel</p_vendor>
<p_model>Intel(R) Core(TM) i7 CPU 965 @ 3.20GHz [Family 6 Model 26 Stepping 4]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe</p_features>
<p_fpops>3401180948.754091</p_fpops>
<p_iops>11177236244.163149</p_iops>
<p_membw>125000000.000000</p_membw>
<p_calculated>1363184526.231028</p_calculated>
<p_vm_extensions_disabled>0</p_vm_extensions_disabled>
<m_nbytes>6432997376.000000</m_nbytes>
<m_cache>262144.000000</m_cache>
<m_swap>23612866560.000000</m_swap>
<d_total>297731620864.000000</d_total>
<d_free>195182985216.000000</d_free>
<os_name>Microsoft Windows 8</os_name>
<os_version>x64 Edition, (06.02.9200.00)</os_version>
</host_info>
<disk_usage>
<d_boinc_used_total>2187094135.000000</d_boinc_used_total>
<d_boinc_used_project>103786250.000000</d_boinc_used_project>
<d_project_share>0.000000</d_project_share>
</disk_usage>
<coprocs>
<coproc_cuda>
<count>2</count>
<name>GeForce GTX 660 Ti</name>
<available_ram>2894295040.000000</available_ram>
<have_cuda>1</have_cuda>
<have_opencl>1</have_opencl>
<req_secs>0.000000</req_secs>
<req_instances>0.000000</req_instances>
<estimated_delay>0.000000</estimated_delay>
<peak_flops>3021312000000.000000</peak_flops>
<cudaVersion>5000</cudaVersion>
<drvVersion>31414</drvVersion>
<totalGlobalMem>3220897792.000000</totalGlobalMem>
<sharedMemPerBlock>49152.000000</sharedMemPerBlock>
<regsPerBlock>65536</regsPerBlock>
<warpSize>32</warpSize>
<memPitch>2147483647.000000</memPitch>
<maxThreadsPerBlock>1024</maxThreadsPerBlock>
<maxThreadsDim>1024 1024 64</maxThreadsDim>
<maxGridSize>2147483647 65535 65535</maxGridSize>
<clockRate>1124000</clockRate>
<totalConstMem>65536.000000</totalConstMem>
<major>3</major>
<minor>0</minor>
<textureAlignment>512.000000</textureAlignment>
<deviceOverlap>1</deviceOverlap>
<multiProcessorCount>7</multiProcessorCount>
<coproc_opencl>
<name>GeForce GTX 660 Ti</name>
<vendor>NVIDIA Corporation</vendor>
<vendor_id>4318</vendor_id>
<available>1</available>
<half_fp_config>0</half_fp_config>
<single_fp_config>63</single_fp_config>
<double_fp_config>63</double_fp_config>
<endian_little>1</endian_little>
<execution_capabilities>1</execution_capabilities>
<extensions>cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 </extensions>
<global_mem_size>3220897792</global_mem_size>
<local_mem_size>49152</local_mem_size>
<max_clock_frequency>1124</max_clock_frequency>
<max_compute_units>7</max_compute_units>
<opencl_platform_version>OpenCL 1.1 CUDA 4.2.1</opencl_platform_version>
<opencl_device_version>OpenCL 1.1 CUDA</opencl_device_version>
<opencl_driver_version>314.14</opencl_driver_version>
</coproc_opencl>
</coproc_cuda>
</coprocs>
<result>
<name>rb_03_15_37629_71478__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76615_428_0</name>
<final_cpu_time>3574.825000</final_cpu_time>
<final_elapsed_time>3584.572634</final_elapsed_time>
<exit_status>0</exit_status>
<state>5</state>
<platform>windows_x86_64</platform>
<version_num>345</version_num>
<app_version_num>345</app_version_num>
<stderr_out>
<core_client_version>7.0.56</core_client_version>
<![CDATA[
<stderr_txt>
[2013- 3-15 19:53: 3:] :: BOINC:: Initializing ... ok.
[2013- 3-15 19:53: 3:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_03_15_37629_71478__t000__1_C1_robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 3600
======================================================
DONE :: 6 starting structures 3574.79 cpu seconds
This process generated 6 decoys from 6 attempts
======================================================
BOINC :: WS_max 4.27262e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
</stderr_out>
<file_info>
<name>rb_03_15_37629_71478__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76615_428_0_0</name>
<nbytes>181068.000000</nbytes>
<max_nbytes>25000000.000000</max_nbytes>
<md5_cksum>3a9fac1c29c27424c4d2bb61e0e32228</md5_cksum>
<upload_url>http://srv1.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
</file_info>
</result>
<app_versions>
<app_version>
<app_name>minirosetta</app_name>
<version_num>345</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>3401180948.754091</flops>
<api_version>6.5.0</api_version>
</app_version>
</app_versions>
<other_results>
</other_results>
</scheduler_request>
ID: 75241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75243 - Posted: 16 Mar 2013, 11:49:11 UTC - in response to Message 75241.  
Last modified: 16 Mar 2013, 12:07:28 UTC

So... To continue my testing... I uninstalled everything nVidia, restarted, got some Rosetta tasks, and let them process. The scheduler request (which reported the completed tasks), did not have any blocks for <coprocs>, for <coproc_cuda>, or for <coproc_opencl>.

And guess what. It worked, and the Task details shows "Outcome: Success" and "application version: 3.45"

So...
Rosetta Project admins...

The post right above has the scheduler request that results in Client error.
This post right here has the scheduler request that results in Success.

It seems that the bug may be with your code's processing/parsing of a scheduler request xml block that has 1 or more of the following tags:
<coprocs>, <coproc_cuda>, <coproc_opencl>
... possibly also dependent on the details within those tags.

Please find a way to fix this!
I've done everything I possibly can to help you.
It's on you now to actually fix it!

Until you do, you are WASTING TONS OF PEOPLE'S TIME (since all their work gets invalidated)


<scheduler_request>
<authenticator>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</authenticator>
<hostid>1527365</hostid>
<rpc_seqno>9</rpc_seqno>
<core_client_major_version>7</core_client_major_version>
<core_client_minor_version>0</core_client_minor_version>
<core_client_release>56</core_client_release>
<resource_share_fraction>0.003311</resource_share_fraction>
<rrs_fraction>1.000000</rrs_fraction>
<prrs_fraction>1.000000</prrs_fraction>
<duration_correction_factor>0.807570</duration_correction_factor>
<allow_multiple_clients>0</allow_multiple_clients>
<sandbox>0</sandbox>
<work_req_seconds>0.000000</work_req_seconds>
<cpu_req_secs>0.000000</cpu_req_secs>
<cpu_req_instances>0.000000</cpu_req_instances>
<estimated_delay>0.000000</estimated_delay>
<client_cap_plan_class>1</client_cap_plan_class>
<platform_name>windows_x86_64</platform_name>
<alt_platform>
<name>windows_intelx86</name>
</alt_platform>
<code_sign_key>
1024
da94985671f399f2ccbb8711004a4d7b782f239babc54e4db341dd1c7b94fbf3
745d30084d332667546f400f5063e683c812a69a5d0945f53f0421961337e3f0
cfad19652eb4f50523473f92ee3b1f43d358a5ba911479e553f43c91b8a4939a
6aa5258107ef609a240bcffcfc9a19c8a8b0df99fdb9508694d499478fb0a931
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000010001
.
</code_sign_key>
<working_global_preferences>
<global_preferences>
<source_project>http://boincsimap.org/boincsimap/</source_project>
<mod_time>1359674284.000000</mod_time>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active>1</run_gpu_if_user_active>
<suspend_if_no_recent_input>0.000000</suspend_if_no_recent_input>
<suspend_cpu_usage>0.000000</suspend_cpu_usage>
<start_hour>0.000000</start_hour>
<end_hour>0.000000</end_hour>
<net_start_hour>0.000000</net_start_hour>
<net_end_hour>0.000000</net_end_hour>
<leave_apps_in_memory>1</leave_apps_in_memory>
<confirm_before_connecting>0</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<dont_verify_images>0</dont_verify_images>
<work_buf_min_days>0.010000</work_buf_min_days>
<work_buf_additional_days>0.000000</work_buf_additional_days>
<max_ncpus_pct>100.000000</max_ncpus_pct>
<cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
<disk_interval>60.000000</disk_interval>
<disk_max_used_gb>0.000000</disk_max_used_gb>
<disk_max_used_pct>0.000000</disk_max_used_pct>
<disk_min_free_gb>0.000000</disk_min_free_gb>
<vm_max_used_pct>75.000000</vm_max_used_pct>
<ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
<ram_max_used_idle_pct>50.000000</ram_max_used_idle_pct>
<idle_time_to_run>1.000000</idle_time_to_run>
<max_bytes_sec_up>65536.000000</max_bytes_sec_up>
<max_bytes_sec_down>0.000000</max_bytes_sec_down>
<cpu_usage_limit>100.000000</cpu_usage_limit>
<daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<override_file_present>1</override_file_present>
<network_wifi_only>1</network_wifi_only>
</global_preferences>
</working_global_preferences>
<global_preferences>
<source_project>http://boincsimap.org/boincsimap/</source_project>
<source_scheduler>http://boincsimap.org/boincsimap_cgi/cgi</source_scheduler>

<mod_time>1359674284</mod_time>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>0</run_if_user_active>
<run_gpu_if_user_active>0</run_gpu_if_user_active>
<idle_time_to_run>1</idle_time_to_run>
<suspend_if_no_recent_input>0</suspend_if_no_recent_input>
<suspend_cpu_usage>0</suspend_cpu_usage>
<leave_apps_in_memory>1</leave_apps_in_memory>
<cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
<max_cpus>0</max_cpus>
<max_ncpus_pct>0</max_ncpus_pct>
<cpu_usage_limit>0</cpu_usage_limit>
<disk_max_used_gb>0</disk_max_used_gb>
<disk_min_free_gb>0.001</disk_min_free_gb>
<disk_max_used_pct>0</disk_max_used_pct>
<disk_interval>60</disk_interval>
<vm_max_used_pct>75</vm_max_used_pct>
<ram_max_used_busy_pct>50</ram_max_used_busy_pct>
<ram_max_used_idle_pct>50</ram_max_used_idle_pct>
<work_buf_min_days>0.1</work_buf_min_days>
<work_buf_additional_days>0.1</work_buf_additional_days>
<confirm_before_connecting>0</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<max_bytes_sec_down>0</max_bytes_sec_down>
<max_bytes_sec_up>64000</max_bytes_sec_up>
<daily_xfer_limit_mb>0</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<dont_verify_images>0</dont_verify_images>
</global_preferences>
<global_prefs_source_email_hash>2b494cb8c093ddb9e1feb73a3fa6fe20</global_prefs_source_email_hash>
<cross_project_id>67264dbbbc1de9e0827409efb9c4da1f</cross_project_id>
<time_stats>
<on_frac>0.965821</on_frac>
<connected_frac>0.947716</connected_frac>
<cpu_and_network_available_frac>0.939162</cpu_and_network_available_frac>
<active_frac>0.940553</active_frac>
<gpu_active_frac>0.936244</gpu_active_frac>
<client_start_time>1363431785.000399</client_start_time>
<previous_uptime>2294.042604</previous_uptime>
<now>1363434079.043003</now>
</time_stats>
<net_stats>
<bwup>25690.154122</bwup>
<avg_up>198255376.714563</avg_up>
<avg_time_up>1363434074.940483</avg_time_up>
<bwdown>218980.828963</bwdown>
<avg_down>305281760.260450</avg_down>
<avg_time_down>1363431824.971508</avg_time_down>
</net_stats>
<host_info>
<timezone>-14400</timezone>
<domain_name>RacerX</domain_name>
<ip_addr>192.168.2.102</ip_addr>
<host_cpid>4fb15136e844af76c810aa64b167a211</host_cpid>
<p_ncpus>8</p_ncpus>
<p_vendor>GenuineIntel</p_vendor>
<p_model>Intel(R) Core(TM) i7 CPU 965 @ 3.20GHz [Family 6 Model 26 Stepping 4]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe</p_features>
<p_fpops>3401180948.754091</p_fpops>
<p_iops>11177236244.163149</p_iops>
<p_membw>125000000.000000</p_membw>
<p_calculated>1363184526.231028</p_calculated>
<p_vm_extensions_disabled>0</p_vm_extensions_disabled>
<m_nbytes>6432997376.000000</m_nbytes>
<m_cache>262144.000000</m_cache>
<m_swap>23612866560.000000</m_swap>
<d_total>297731620864.000000</d_total>
<d_free>193927839744.000000</d_free>
<os_name>Microsoft Windows 8</os_name>
<os_version>x64 Edition, (06.02.9200.00)</os_version>
</host_info>
<disk_usage>
<d_boinc_used_total>2390351966.000000</d_boinc_used_total>
<d_boinc_used_project>664129130.000000</d_boinc_used_project>
<d_project_share>0.000000</d_project_share>
</disk_usage>
<result>
<name>cys82__1424_relax_SAVE_ALL_OUT_76549_177_0</name>
<final_cpu_time>3547.010000</final_cpu_time>
<final_elapsed_time>3826.862356</final_elapsed_time>
<exit_status>0</exit_status>
<state>5</state>
<platform>windows_x86_64</platform>
<version_num>345</version_num>
<app_version_num>345</app_version_num>
<stderr_out>
<core_client_version>7.0.56</core_client_version>
<![CDATA[
<stderr_txt>
[2013- 3-16 6:18:53:] :: BOINC:: Initializing ... ok.
[2013- 3-16 6:18:53:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/cys82__1424_fold_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 3600
[2013- 3-16 6:54:51:] :: BOINC:: Initializing ... ok.
[2013- 3-16 6:54:51:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
[2013- 3-16 7: 3:36:] :: BOINC:: Initializing ... ok.
[2013- 3-16 7: 3:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/cys82__1424_fold_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 3600
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_00001_00016_FastRelax__chk13_fa ... success!
======================================================
DONE :: 48 starting structures 3546.9 cpu seconds
This process generated 48 decoys from 48 attempts
======================================================
BOINC :: WS_max 3.63799e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
</stderr_out>
<file_info>
<name>cys82__1424_relax_SAVE_ALL_OUT_76549_177_0_0</name>
<nbytes>666712.000000</nbytes>
<max_nbytes>50000000.000000</max_nbytes>
<md5_cksum>21db0c0fe695f7bf4c8daea720b094f3</md5_cksum>
<upload_url>http://srv3.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
</file_info>
</result>
<app_versions>
<app_version>
<app_name>minirosetta</app_name>
<version_num>345</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>3401180948.754091</flops>
<api_version>6.5.0</api_version>
</app_version>
</app_versions>
<other_results>
<other_result>
<name>ActCys_P2_2x3_s3_f1_abinitio_design_y132_001_76654_59_0</name>
<app_version>0</app_version>
</other_result>
<other_result>
<name>rb_03_15_37650_71532__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_76667_2152_0</name>
<app_version>0</app_version>
</other_result>
<other_result>
<name>cys82__1601_relax_SAVE_ALL_OUT_76558_208_0</name>
<app_version>0</app_version>
</other_result>
</other_results>
</scheduler_request>
ID: 75243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Client errors



©2024 University of Washington
https://www.bakerlab.org