Client errors

Message boards : Number crunching : Client errors

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Rick A. Sponholz
Avatar

Send message
Joined: 6 Sep 10
Posts: 14
Credit: 7,823,937
RAC: 0
Message 75264 - Posted: 20 Mar 2013, 18:59:52 UTC - in response to Message 75243.  
Last modified: 20 Mar 2013, 19:00:41 UTC

[quote]So... To continue my testing... I uninstalled everything nVidia, restarted, got some Rosetta tasks, and let them process. The scheduler request (which reported the completed tasks), did not have any blocks for <coprocs>, for <coproc_cuda>, or for <coproc_opencl>.

And guess what. It worked, and the Task details shows "Outcome: Success" and "application version: 3.45"

So...
Rosetta Project admins...

The post right above has the scheduler request that results in Client error.
This post right here has the scheduler request that results in Success.

It seems that the bug may be with your code's processing/parsing of a scheduler request xml block that has 1 or more of the following tags:
<coprocs>, <coproc_cuda>, <coproc_opencl>
... possibly also dependent on the details within those tags.

Please find a way to fix this!
I've done everything I possibly can to help you.
It's on you now to actually fix it!

Until you do, you are WASTING TONS OF PEOPLE'S TIME (since all their work gets invalidated)
[quote]

Dear Rosetta Admins,
Please respond to JacobKlein's post. Do you acknowledge the problem? Do you agree with the cause? When are you going to fix this? I'm waiting for your reply, and I bet many other volunteers are waiting too. Thanks In Advance, Rick
ID: 75264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75268 - Posted: 21 Mar 2013, 10:56:21 UTC - in response to Message 75264.  


Dear Rosetta Admins,
Please respond to JacobKlein's post. Do you acknowledge the problem? Do you agree with the cause? When are you going to fix this? I'm waiting for your reply, and I bet many other volunteers are waiting too. Thanks In Advance, Rick


In the past when they did respond the Admins, or their reps, said things are progressing normally and the errors are within acceptable limits. Dr. David Anderson, the main Boinc Programmer, DID come here and a newer version of Boinc DID seem to help, but the newer still versions seem to be having the same old problems again.

IMO there are too many other Boinc Projects that can use our help, if someone can't get Rosetta to work properly they should just move on and put Rosetta on the back burner until the Project gets its act together. This fishing spot seems to be taking all the bait but not letting people actually catch the fish, it is time to move to another spot, the sea is full of other spots!
ID: 75268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 201,474,191
RAC: 25,286
Message 75269 - Posted: 21 Mar 2013, 12:12:26 UTC

This problem appears to be fixed for me. I have 5 linux computers running under my name and as of 3-20-13 they all started returning successful tasks, before this they were all client errors. I have made no changes to them not even a reboot. I don't see any posts from Rosetta admins that they changed anything but something has changed.
I would suggest that anybody with this problem enable new work to see if this problem is really fixed.
Thanks Jim
ID: 75269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75270 - Posted: 21 Mar 2013, 16:34:38 UTC
Last modified: 21 Mar 2013, 17:22:54 UTC

The problem appears to have been fixed.

I don't know the details of the fix, but from what I understand...
The Rosetta Admins and David Anderson were able to work together, using the scheduler requests that I posted, to identify and correct the problem. Rosetta's server software was recompiled, but I'm not sure if code was changed/updated; http://srv4.bakerlab.org/rosetta_cgi/cgi still shows scheduler version 605.

My tasks are currently resulting in:
Server state: Over
Outcome: Success
Client state: Done
Exit status: 0 (0x0)
Validate state: Valid
application version: 3.45
... even though I have nVidia GPUs capable of OpenCL work, using the latest nVidia drivers!

So, as far as I can tell, IT'S FIXED!
Thank you Rosetta for finally resolving this issue.
ID: 75270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 75271 - Posted: 21 Mar 2013, 20:20:58 UTC - in response to Message 75270.  
Last modified: 21 Mar 2013, 20:24:41 UTC

corrected typo

The Problem is fixed for me too. My host returned its first valid results ever.

Thank You Rosetta and David Anderson for finally fixing it.

Rayburner


The problem appears to have been fixed.

I don't know the details of the fix, but from what I understand...
The Rosetta Admins and David Anderson were able to work together, using the scheduler requests that I posted, to identify and correct the problem. Rosetta's server software was recompiled, but I'm not sure if code was changed/updated; http://srv4.bakerlab.org/rosetta_cgi/cgi still shows scheduler version 605.

My tasks are currently resulting in:
Server state: Over
Outcome: Success
Client state: Done
Exit status: 0 (0x0)
Validate state: Valid
application version: 3.45
... even though I have nVidia GPUs capable of OpenCL work, using the latest nVidia drivers!

So, as far as I can tell, IT'S FIXED!
Thank you Rosetta for finally resolving this issue.

ID: 75271 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75273 - Posted: 22 Mar 2013, 2:06:33 UTC - in response to Message 75270.  

The problem appears to have been fixed.

I don't know the details of the fix, but from what I understand...
The Rosetta Admins and David Anderson were able to work together, using the scheduler requests that I posted, to identify and correct the problem. Rosetta's server software was recompiled, but I'm not sure if code was changed/updated; http://srv4.bakerlab.org/rosetta_cgi/cgi still shows scheduler version 605.

My tasks are currently resulting in:
Server state: Over
Outcome: Success
Client state: Done
Exit status: 0 (0x0)
Validate state: Valid
application version: 3.45
... even though I have nVidia GPUs capable of OpenCL work, using the latest nVidia drivers!

So, as far as I can tell, IT'S FIXED!
Thank you Rosetta for finally resolving this issue.


I thought the latest drivers and latest boinc versions fixed the issue...
ID: 75273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75274 - Posted: 22 Mar 2013, 3:54:05 UTC - in response to Message 75273.  

I thought the latest drivers and latest boinc versions fixed the issue...


You thought wrong, as evidenced by my results in a post within this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6177&nowrap=true#75240

But, I do believe the issue is truly fixed now.
ID: 75274 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75275 - Posted: 22 Mar 2013, 11:48:34 UTC - in response to Message 75274.  

I thought the latest drivers and latest boinc versions fixed the issue...


You thought wrong, as evidenced by my results in a post within this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6177&nowrap=true#75240

But, I do believe the issue is truly fixed now.


I am AMAZED and VERY HAPPY that they were able to keep working on it until it was HOPEFULLY fixed permanently!! WELL DONE ROSETTA!!! Now they just need to put it on the home page to let EVERYONE know it is fixed!!
ID: 75275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Student

Send message
Joined: 24 Oct 06
Posts: 3
Credit: 57,404
RAC: 0
Message 75277 - Posted: 23 Mar 2013, 14:45:50 UTC
Last modified: 23 Mar 2013, 14:46:15 UTC

Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!!
ID: 75277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75278 - Posted: 23 Mar 2013, 15:35:47 UTC - in response to Message 75277.  
Last modified: 23 Mar 2013, 15:37:56 UTC

Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!!


I just want it to be clear that:
The real fix was that the server software was recompiled on 3/20/2013.
It had nothing to do with BOINC Manager version, and nothing to do with nVidia driver version.
ID: 75278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 75279 - Posted: 23 Mar 2013, 16:56:43 UTC - in response to Message 75278.  

Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!!


I just want it to be clear that:
The real fix was that the server software was recompiled on 3/20/2013.
It had nothing to do with BOINC Manager version, and nothing to do with nVidia driver version.



I'm confused, there were users who reported success before 3/20 by upgrading drivers.
ID: 75279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75280 - Posted: 23 Mar 2013, 17:26:52 UTC - in response to Message 75279.  
Last modified: 23 Mar 2013, 17:28:41 UTC

I'm confused, there were users who reported success before 3/20 by upgrading drivers.


From the best I can tell, the task result status really depended on whether your scheduler request had certain xml data in it or not, when reporting a result. It's probable that clients without nVidia GPUs never saw the problem. It's possible that certain configurations of GPUs never had problems. And it's also possible that certain driver versions had certain data in the xml scheduler request that triggered the problem, while other driver versions didn't have that data.

The problem itself was that the server was not properly handling certain xml within the scheduler requests. I don't know any more details than that.

I had notified the Rosetta project admins, as well as David Anderson. I do know they recompiled the server software on 3/20/2013, and I was asked to re-test it. And, when I re-tested using the exact same GPU configuration along with the exact same BOINC Manager version and nVidia driver versions, the results were now successful instead of Client Error.

It's the recompile of the server software that fixed this nasty error.
ID: 75280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kenneth DePrizio

Send message
Joined: 15 Jul 07
Posts: 15
Credit: 3,123,915
RAC: 0
Message 75281 - Posted: 23 Mar 2013, 21:14:57 UTC

Yeah, I can confirm that the server software is what fixed the problem for me. Upgrading drivers did nothing before.
ID: 75281 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JugNut

Send message
Joined: 30 Apr 12
Posts: 11
Credit: 2,437,453
RAC: 0
Message 75282 - Posted: 24 Mar 2013, 1:06:12 UTC

Wooo Hoo my first successful WU's since installing my new cruncher (i7 3930k) some 6 months ago. FINALLY!!!
ID: 75282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,783,789
RAC: 4,928
Message 75283 - Posted: 24 Mar 2013, 3:17:44 UTC

Just to clarify for me: is it now safe to upgrade both my BOINC Manager and my nVidia driver to the latest versions? I'm at 6.12.34 (x64) and 306.97, respectively. That's how I've been avoiding client error for several months.

Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]
Processor: 256.00 KB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe
OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
Memory: 15.94 GB physical, 31.88 GB virtual
Disk: 197.98 GB total, 124.20 GB free
NVIDIA GPU 0: GeForce GT 620 (driver version 30697, CUDA version 5000, compute capability 2.1, 2048MB, 182 GFLOPS peak)
ID: 75283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75284 - Posted: 24 Mar 2013, 3:36:10 UTC - in response to Message 75283.  
Last modified: 24 Mar 2013, 3:37:07 UTC

Just to clarify for me: is it now safe to upgrade both my BOINC Manager and my nVidia driver to the latest versions? I'm at 6.12.34 (x64) and 306.97, respectively. That's how I've been avoiding client error for several months.


I'd say, Yes.
I'm running the latest beta BOINC, BOINC v7.0.58 x64 Beta, along with the latest beta nVidia drivers, v314.21 x64 Beta, without any "Client error" problems.
If you only want to run "release" software, then I believe BOINC v7.0.28 and nVidia v314.07 WHQL should both work just fine.

Good luck!
ID: 75284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75285 - Posted: 24 Mar 2013, 17:00:49 UTC

Thank you JacobKlein for all of your efforts and initiative to help get this resolved.

To try and word what Jacob has been saying a little differently, the results that the clients were producing, regardless of BOINC or GPU versions, were good. This is probably part of what threw off the Rosetta admins. But some of the server code that runs was flagging things as invalid. So the tasks were treated as invalid (i.e. reissued to another host), but credit granted by the daily script that awards credit for such outcomes (reflecting the value to the project of learning what's working and what's not).

Since the root cause was in the server validation code, and it's now been revised to handle the various XML tags that various combinations of GPU driver version, and BOINC version might throw at it, you should not see any problems with making changes on your client host.
Rosetta Moderator: Mod.Sense
ID: 75285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacob Klein

Send message
Joined: 3 Jul 07
Posts: 15
Credit: 7,098,747
RAC: 0
Message 75286 - Posted: 24 Mar 2013, 17:25:46 UTC
Last modified: 24 Mar 2013, 17:27:31 UTC

Awesome!

So, not only is it a win for the clients (results are now successful, no more confusion or error reporting), but it's a win for Rosetta too (work is done quicker since results will only have to calculated once, instead of having to re-send the work unit on every Client Error.)

I just spent some money to upgrade my computer's RAM from 6GB to 12GB, so I can do more Rosetta tasks at once.

Keep on crunching!
ID: 75286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen Miller

Send message
Joined: 18 Sep 05
Posts: 13
Credit: 16,294,215
RAC: 0
Message 75287 - Posted: 24 Mar 2013, 22:55:45 UTC

I nominate JacobKlein for the Rosetta@home's HERO AWARD or some method on the FRONT PAGE to acknowledge his persistent effort to get credit where credit is due (double entendre intended).

Crunching since 18 Sep 2005

ID: 75287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 2,882
Message 75288 - Posted: 25 Mar 2013, 12:04:48 UTC - in response to Message 75287.  

I nominate JacobKlein for the Rosetta@home's HERO AWARD or some method on the FRONT PAGE to acknowledge his persistent effort to get credit where credit is due (double entendre intended).


I second that!!
ID: 75288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Client errors



©2024 University of Washington
https://www.bakerlab.org