all wu's error on 1 system, but OK on another

Message boards : Number crunching : all wu's error on 1 system, but OK on another

To post messages, you must log in.

AuthorMessage
Profile JStateson
Avatar

Send message
Joined: 7 May 07
Posts: 15
Credit: 4,061,331
RAC: 0
Message 74898 - Posted: 15 Jan 2013, 18:29:28 UTC
Last modified: 15 Jan 2013, 18:34:24 UTC

This system had nothing but errors. Opteron290 (fastest) with 2gb of memory.

This one is slower opteron275 with 4gb memory. Almost all WU's are good.

Looking at stderr_txt I dont see anything exceptional except all errors on the k8ndre-1

I dont see what is causing the problem, maybe someone else can. I could try add more memory.

Both run same version of windows 7 but the failing mombo is asus with gtx650ti the other tyan with pair of gts250. All of the gts540ti are completing their primegrid tasks with valid results and the gts250 generate valid results too, so what gives?
ID: 74898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,177,195
RAC: 3,176
Message 74909 - Posted: 16 Jan 2013, 11:40:42 UTC - in response to Message 74898.  

This system had nothing but errors. Opteron290 (fastest) with 2gb of memory.

This one is slower opteron275 with 4gb memory. Almost all WU's are good.

Looking at stderr_txt I dont see anything exceptional except all errors on the k8ndre-1

I dont see what is causing the problem, maybe someone else can. I could try add more memory.

Both run same version of windows 7 but the failing mombo is asus with gtx650ti the other tyan with pair of gts250. All of the gts540ti are completing their primegrid tasks with valid results and the gts250 generate valid results too, so what gives?


Try rolling back the Nvidia drivers to 306.97 or earlier, the newer drivers seem to cause at least some of the problems.
ID: 74909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JStateson
Avatar

Send message
Joined: 7 May 07
Posts: 15
Credit: 4,061,331
RAC: 0
Message 74912 - Posted: 16 Jan 2013, 13:52:24 UTC

The Rosetta work units are all CPU tasks. I am using the GPU's for the PrimeGrid challenge. I don't see how processing PrimeGrid GPU tasks can cause all the Rosetta CPU tasks to fail. However, after 30 years or programming, I know that one can only be 99.999999... certain that software will behave as designed. ie One cannot rule out side effects so there is a (slim) chance you might be correct.

K8NDRE is using 306.97 and except for the tasks I aborted, it seems to have validated tasks.

S2877 is using 310.70 and all are failing ...hmm...

After the PrimeGrid challenge completes, I will roll back the driver. I am in 13th place in the challenge and it would be unlucky to roll it back now.
ID: 74912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JStateson
Avatar

Send message
Joined: 7 May 07
Posts: 15
Credit: 4,061,331
RAC: 0
Message 74914 - Posted: 16 Jan 2013, 17:58:47 UTC

I have these reversed as I was not paying attention to which web page was on the screen. Rosetta does not show GPU info so I pulled up another project that shows the GPU version but got the two systems reversed. I have a (small) boinc farm and it is easy to get systems mixed up when they are not in front of me.

The K8NDRE, which fails all tasks, is running 306.97 and the one that seeming is working just fine, s2877, has a later version, 310.70, so a rollback would not solve the problem here. If anything I need to advance 306.97 to latest drivers.

It would be nice if the "show_host_detail" web page here showed the GPU and version even if the project does not use a gpu. Likewise, it would be nice to select "valid", "invalid", or "error" task results,etc to get a quick count of problems.



ID: 74914 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,177,195
RAC: 3,176
Message 74916 - Posted: 17 Jan 2013, 12:52:16 UTC - in response to Message 74912.  

The Rosetta work units are all CPU tasks. I am using the GPU's for the PrimeGrid challenge. I don't see how processing PrimeGrid GPU tasks can cause all the Rosetta CPU tasks to fail. However, after 30 years or programming, I know that one can only be 99.999999... certain that software will behave as designed. ie One cannot rule out side effects so there is a (slim) chance you might be correct.

K8NDRE is using 306.97 and except for the tasks I aborted, it seems to have validated tasks.

S2877 is using 310.70 and all are failing ...hmm...

After the PrimeGrid challenge completes, I will roll back the driver. I am in 13th place in the challenge and it would be unlucky to roll it back now.


No one knows but it does! That is just part of the frustrating nature of these problems, the Rosetta Admins PROMISED to help but have not, the ONLY info we have is from users that have found answers thru MANY trail and error sessions. ALL Rosetta has said is that 'it works just fine on the beta site'!!
ID: 74916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : all wu's error on 1 system, but OK on another



©2024 University of Washington
https://www.bakerlab.org