Client errors

Message boards : Number crunching : Client errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 75037 - Posted: 4 Feb 2013, 9:23:38 UTC - in response to Message 75011.  
Last modified: 4 Feb 2013, 9:29:39 UTC


I JUST dumped Rosetta off of one of my pc's as it REFUSED to send new work to my pc, Boinc kept saying "Not requesting tasks" even though I had NO cpu tasks on this 6 core pc! I am now crunching for Poem on that pc and it got 50 or more cpu units with NO problem!! There are WAAAY too many fish in the sea to waste time on one project that is being a PITA!!!!


Sounds more like a core client problem (bugs? in BOINC? say it's not so) than a scheduler issue.


I don't know I even reset the project and every other pc is working just fine on Rosetta, that one just didn't! It's okay Poem loves my time!

This definitely is a core client problem. It usually occurs when the cache is either really stuffed (high values for the first two options in "Network usage") or the other project has collected a really high "long term debit".
Unfortunately the BOINC GUI has no feature to reset those debits, the command line thingie can do it though, e.g. :

boinccmd.exe --host <YourComputerName> --set_debts http://boinc.fzk.de/poem/ 0 0

or

boinccmd.exe --host <YourComputerName> --set_debts https://boinc.bakerlab.org/rosetta/ 0 100000


The second value is the one to modify the long term debits but the command needs the one for short term as well, that's why you have to put both 0's
ID: 75037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 75038 - Posted: 4 Feb 2013, 12:27:55 UTC - in response to Message 75037.  
Last modified: 4 Feb 2013, 12:28:18 UTC


I JUST dumped Rosetta off of one of my pc's as it REFUSED to send new work to my pc, Boinc kept saying "Not requesting tasks" even though I had NO cpu tasks on this 6 core pc! I am now crunching for Poem on that pc and it got 50 or more cpu units with NO problem!! There are WAAAY too many fish in the sea to waste time on one project that is being a PITA!!!!


Sounds more like a core client problem (bugs? in BOINC? say it's not so) than a scheduler issue.


I don't know I even reset the project and every other pc is working just fine on Rosetta, that one just didn't! It's okay Poem loves my time!

This definitely is a core client problem. It usually occurs when the cache is either really stuffed (high values for the first two options in "Network usage") or the other project has collected a really high "long term debit".
Unfortunately the BOINC GUI has no feature to reset those debits, the command line thingie can do it though, e.g. :

boinccmd.exe --host <YourComputerName> --set_debts http://boinc.fzk.de/poem/ 0 0

or

boinccmd.exe --host <YourComputerName> --set_debts https://boinc.bakerlab.org/rosetta/ 0 100000


The second value is the one to modify the long term debits but the command needs the one for short term as well, that's why you have to put both 0's


Okay but ALL pc's say that as one project I am trying to get to a goal on has no units, so NO project is the highest priority, except the one with the goal, but since it has no units why wouldn't Rosie fill in with some? THAT is the problem, I had NO cpu units on the pc and Rosie refused to get any despite having plenty available!! On that pc I only had 3 projects selected, 1 was a gpu project, Poem, I had it set to NOT get cpu units, and ABC which has no units and Rosie. Obviously I want to crunch Rosie when there are no ABC units. Poem got plenty of gpu units, while my cpu cores were starving. It is a 6 core pc and I only run 1 gpu unit at a time on that pc, so Poem using 1 cpu core made no difference.

BUT I am off to Poem on that pc now and it is okay. With the size of the credits here my goal is a LONG way off, 1 pc here or there for me is immaterial in the long run. Thanks for your help!
ID: 75038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 75041 - Posted: 4 Feb 2013, 22:49:00 UTC - in response to Message 75038.  

Use <work_fetch_debug>1</work_fetch_debug> in cc_config.xml to see BOINC's work fetch policy decisions in the log.
.
ID: 75041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 75042 - Posted: 5 Feb 2013, 11:54:07 UTC - in response to Message 75041.  

Use <work_fetch_debug>1</work_fetch_debug> in cc_config.xml to see BOINC's work fetch policy decisions in the log.


THANKS!
ID: 75042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75043 - Posted: 5 Feb 2013, 13:52:17 UTC

Hi,
I did some tests, which suprised surprised me a little.

First I tried to run Rosetta in a separate session without X running - both tasks eded up with client error (see results no. 560429506 and 560429572).

Then I ran Rosetta in a virtual machine (VirtualBox 4.1.18, BOINC 7.0.27 x86_64) with Windows installed. This task succeeded (task no. 560619860). There is a lot of emulated hardware, so it did not suprise me too much.

But my third try was running Rosetta in WINE and BOINC 7.0.27 x86 - this task (no. 560653185) ended up with success again. There is no emulated hardware, just a few libraries, so I am quite suprised. Or is WINE for Rosetta so much different environment in comparison to native Linux?

Tomorrow I will try to replace nVidia drivers by opensource drivers and I'll let you know about the result.

Greetings,
Pushkin
ID: 75043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75046 - Posted: 6 Feb 2013, 10:01:15 UTC - in response to Message 75043.  

Hi,
I did some tests, which suprised surprised me a little.

First I tried to run Rosetta in a separate session without X running - both tasks eded up with client error (see results no. 560429506 and 560429572).

Then I ran Rosetta in a virtual machine (VirtualBox 4.1.18, BOINC 7.0.27 x86_64) with Windows installed. This task succeeded (task no. 560619860). There is a lot of emulated hardware, so it did not suprise me too much.

But my third try was running Rosetta in WINE and BOINC 7.0.27 x86 - this task (no. 560653185) ended up with success again. There is no emulated hardware, just a few libraries, so I am quite suprised. Or is WINE for Rosetta so much different environment in comparison to native Linux?

Tomorrow I will try to replace nVidia drivers by opensource drivers and I'll let you know about the result.

Greetings,
Pushkin


Hi,
today I installed Nouveau driver instead of proprietary nVidia drivers. The result is - success. The task no. 560822374 ended without client error. It seems, that the problem is really caused by nVidia drivers, not the hardware itself.

Greetings,
Pushkin
ID: 75046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1997
Credit: 9,747,451
RAC: 10,562
Message 75047 - Posted: 6 Feb 2013, 15:11:08 UTC - in response to Message 75017.  

I'll ask people here to submit more test jobs to Ralph.


By the way, Ralph code of 3.45 sucks....
No screensever, no checkpoint, a LOT of errors.
Please, fix it
ID: 75047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 75048 - Posted: 6 Feb 2013, 15:15:14 UTC - in response to Message 75046.  


Hi,
today I installed Nouveau driver instead of proprietary nVidia drivers. The result is - success. The task no. 560822374 ended without client error. It seems, that the problem is really caused by nVidia drivers, not the hardware itself.

Greetings,
Pushkin


Nice test!

To add, on one box, I am using kmod vice nouveau without any problems. It is, however, a 1st-gen i7-950 with a GTX460, not an Ivy Bridge CPU nor the latest model video card. It seems to me that most (all?) folks having this particular problem are using nVidia cards with Ivy Bridge CPUs - but this could be wrong. Can anyone verify?
ID: 75048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 75049 - Posted: 6 Feb 2013, 18:18:40 UTC - in response to Message 75048.  


Hi,
today I installed Nouveau driver instead of proprietary nVidia drivers. The result is - success. The task no. 560822374 ended without client error. It seems, that the problem is really caused by nVidia drivers, not the hardware itself.

Greetings,
Pushkin


Nice test!

To add, on one box, I am using kmod vice nouveau without any problems. It is, however, a 1st-gen i7-950 with a GTX460, not an Ivy Bridge CPU nor the latest model video card. It seems to me that most (all?) folks having this particular problem are using nVidia cards with Ivy Bridge CPUs - but this could be wrong. Can anyone verify?


Ivy Bridge here.
Bug here.
ID: 75049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 75050 - Posted: 6 Feb 2013, 19:54:21 UTC - in response to Message 75047.  

I'll ask people here to submit more test jobs to Ralph.


By the way, Ralph code of 3.45 sucks....
No screensever, no checkpoint, a LOT of errors.
Please, fix it



Any specific examples would help. 3.45 is what is running on R@h so hopefully there isn't a general issue that we need to address other than the ones we are already aware of. Also, there may be a lot of errors because some lab members in our group are testing new jobs on Ralph (the main purpose of Ralph) so they may fail.
ID: 75050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 201,474,191
RAC: 28,532
Message 75052 - Posted: 6 Feb 2013, 20:13:49 UTC

This computer also has this problem & is not Ivy Bridge.
Intel(R) Pentium(R) 4 CPU 3.00GHz Nvidia gts450.
Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27
It was ok until upgrading from Ubuntu 10.04 & new drivers & boinc that came with it.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1485068

I have been checking Ralph but it never shows tasks available, I will try to set up a computer there anyway as soon as i get a chance.

Jim
ID: 75052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 75053 - Posted: 6 Feb 2013, 22:10:21 UTC - in response to Message 75052.  

This computer also has this problem & is not Ivy Bridge.
Intel(R) Pentium(R) 4 CPU 3.00GHz Nvidia gts450.
Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27
It was ok until upgrading from Ubuntu 10.04 & new drivers & boinc that came with it.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1485068

I have been checking Ralph but it never shows tasks available, I will try to set up a computer there anyway as soon as i get a chance.

Jim


Ugh. There goes that theory, heh. I was initially wondering about the integrated graphics controller in ivy bridge CPUs with NVIDIA cards.
ID: 75053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,262,530
RAC: 19,111
Message 75054 - Posted: 7 Feb 2013, 10:35:38 UTC

Yes Ivy Bridge is not essential part of the bug. This bug seen on many CPUs from Pentium 4 through 4 generations of "Core" CPUs. (not sure about AMD processors)

Nvidia GPU is essential part. Seems not hardware but active(not just intalled) nv driver. Even not drivers itself, but someting drivers related. No clear "good" or clearly "bad" versions. But the bug is most often seen after installing / updating drivers. And often disappears after changing to a different version.
ID: 75054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75057 - Posted: 7 Feb 2013, 11:27:07 UTC

Hi guys,
something strange happened. After all those playing with drivers I went back to proprietary drivers and since then I receive successful tasks - 560862664, 561025484, 561025532, 561026069 and 561026753. Did anything change in Rosetta code or should I start to believe in miracles? :-O

Greetings,
Pushkin
ID: 75057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 75060 - Posted: 7 Feb 2013, 12:11:56 UTC - in response to Message 75054.  
Last modified: 7 Feb 2013, 12:12:17 UTC

Yes Ivy Bridge is not essential part of the bug. This bug seen on many CPUs from Pentium 4 through 4 generations of "Core" CPUs. (not sure about AMD processors)

Nvidia GPU is essential part. Seems not hardware but active(not just intalled) nv driver. Even not drivers itself, but someting drivers related. No clear "good" or clearly "bad" versions. But the bug is most often seen after installing / updating drivers. And often disappears after changing to a different version.


I have an Ivy Bridge cpu in this laptop and it is working just fine. I have an Intel i7-3612QM cpu and an Intel HD Graphics 4000 gpu. Rosetta is working just fine, knock on wood! NO gpu crunching though!
ID: 75060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JuhaM

Send message
Joined: 2 Nov 07
Posts: 3
Credit: 2,740,103
RAC: 396
Message 75061 - Posted: 7 Feb 2013, 13:30:30 UTC

Tasks 560643811, 560643810, 560643809 and 560643787 all validated as invalid. Although I got credit from them.

If I remember right the same has happened for about last six months. All task validate as invalid, but still grant credit. It's really confusing to crunch Rosetta when the outcome is this !?!

I crunch other projects at the same time, mainly POEM GPU tasks and WCG CPU tasks.

Hardware:
BOINC version 7.0.27

CPU:
Hardware Class: cpu
Arch: X86-64
Vendor: "AuthenticAMD"
Model: 21.1.2 "AMD FX(tm)-6100 Six-Core Processor

GPU:
NVidia GTX 460
driver 304.51 (from Ubuntu repository)

Distributor ID: Ubuntu
Description: Ubuntu 12.10
Release: 12.10
Codename: quantal

Kernel: Linux 3.5.0-23-generic #35-Ubuntu SMP x86_64

RAM 16 GB
ID: 75061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 75062 - Posted: 7 Feb 2013, 17:37:04 UTC - in response to Message 75061.  

Tasks 560643811, 560643810, 560643809 and 560643787 all validated as invalid. Although I got credit from them.

Here you also get credits for invalid results. They are doing that AFAIK because of the rather high percentage of bad WUs.

(Although IMHO credits should be only awarded if the WU errors out for both wingmen, with the current way many people for sure ignore issues with their computers.)
.
ID: 75062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 201,474,191
RAC: 28,532
Message 75064 - Posted: 8 Feb 2013, 12:55:26 UTC
Last modified: 8 Feb 2013, 13:00:22 UTC

Successful tasks completed on ralph@home. I managed to pick up some tasks on ralph.

Computer ralph
http://ralph.bakerlab.org/show_host_detail.php?hostid=29722

Tasks for computer ralph (all success)
http://ralph.bakerlab.org/results.php?hostid=29722

Computer rosetta
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1579123

Tasks for computer rosetta (all client error)
https://boinc.bakerlab.org/rosetta/results.php?hostid=1579123

Intel I7-3770, Ubuntu linux 12.04 amd64, nvidia driver 310.14. Boinc 7.0.27

There were no changes to this computer, same exact setup, it actually ran some ralph and rosetta tasks at the same time.

To David
I hope this comfirms that ralph does not have this issue. If you any questions please post them or PM me.
Thanks Jim
ID: 75064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alun

Send message
Joined: 27 Feb 10
Posts: 5
Credit: 69,418
RAC: 0
Message 75068 - Posted: 9 Feb 2013, 16:13:03 UTC
Last modified: 9 Feb 2013, 16:14:18 UTC

Is anyone actually actively investigating the issues with nvidia drivers & cards at the moment, or (as it seems from the forums) is it falling to the community to find the problem in the UoW's Rosetta applications?

Question / point: If it was purely a driver issue wouldn't we be seeing errors on other GPU projects running on the same box? GPUGrid, Milkyway, Einstein and SETI are all fine - only Rosetta gets borked by updated drivers...
ID: 75068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 75073 - Posted: 10 Feb 2013, 12:51:58 UTC - in response to Message 75068.  

Is anyone actually actively investigating the issues with nvidia drivers & cards at the moment, or (as it seems from the forums) is it falling to the community to find the problem in the UoW's Rosetta applications?

Question / point: If it was purely a driver issue wouldn't we be seeing errors on other GPU projects running on the same box? GPUGrid, Milkyway, Einstein and SETI are all fine - only Rosetta gets borked by updated drivers...


AND as James Dorsio points out the projects Beta site works just fine!!
ID: 75073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Client errors



©2024 University of Washington
https://www.bakerlab.org