Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 271 · 272 · 273 · 274 · 275 · 276 · 277 . . . 316 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109066 - Posted: 3 Apr 2024, 4:52:03 UTC - in response to Message 109065.  
Last modified: 3 Apr 2024, 4:53:10 UTC

Right now they are 32 core with 16gb of RAM. Which should be enough for crunching.
Some Rosetta 4.20 Tasks require over 2GB of RAM.
32*2= way more than 16GB.
Although the larger RAM Tasks have been very few and far between, 500MB to 1GB has been the usual range for Rosetta 4.20 Tasks lately. And 32*.5= all your RAM.

16GB RAM on a system with 64 cores/threads is way, way, way too little.
Grant
Darwin NT
ID: 109066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile RDTSC

Send message
Joined: 29 Jan 24
Posts: 4
Credit: 1,288,772
RAC: 15,140
Message 109067 - Posted: 3 Apr 2024, 12:08:31 UTC - in response to Message 109009.  
Last modified: 3 Apr 2024, 12:10:22 UTC

A flock of work units arrived recently that are behaving oddly, well, all but one of them...
I have one machine, a workstation Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz / Arch Linux, which crunches Rosetta and WCG packets fine. A few months ago, added a really old dual Intel(R) Xeon(TM) CPU 2.80GHz machine (old Dell server, latest Ubuntu server LTS.) The old machine was getting Rosetta Beta workunits and choking on them; error, error, error... it was able to crunch through several non-beta workunits though. Thought it was the old CPUs, like an unsupported instruction or something. Reading this, now thinking it was bad workunits.
ID: 109067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,933,740
RAC: 17,353
Message 109068 - Posted: 3 Apr 2024, 22:10:16 UTC
Last modified: 3 Apr 2024, 22:10:38 UTC

And we're back...

Looks like the whole website went down for about 10hours today.
Couldn't even get to the Rosetta home page let alone upload results.
Everything going through fine now
ID: 109068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
GDB

Send message
Joined: 5 Oct 17
Posts: 1
Credit: 4,661,957
RAC: 6,537
Message 109069 - Posted: 4 Apr 2024, 1:54:49 UTC

All my units are getting validate errors now.
ID: 109069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MStenholm

Send message
Joined: 18 Apr 20
Posts: 19
Credit: 27,951,567
RAC: 58,470
Message 109070 - Posted: 4 Apr 2024, 4:52:17 UTC

GDB: you are not alone in all returned results getting valitated errors. The top 10 CPUs I checked plus my own got the same verdict.
ID: 109070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109071 - Posted: 4 Apr 2024, 5:19:12 UTC

Yep, The Validator is borked,

For me, anything returned from 3 Apr 2024, 22:02:46 UTC fails, and a quick look at th top computers shows the same thing- everything going back at present fails Validation.

If someone could get the Projects attention?
Grant
Darwin NT
ID: 109071 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,082,016
RAC: 12,014
Message 109072 - Posted: 4 Apr 2024, 7:12:50 UTC - in response to Message 109071.  

If someone could get the Projects attention?

+1
After the over 60 wus failed some hrs ago, i'm ready to upload about ten wus.
Have i to stop the upload?
ID: 109072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Daniel Graf

Send message
Joined: 2 Nov 05
Posts: 12
Credit: 71,975,251
RAC: 75,553
Message 109073 - Posted: 4 Apr 2024, 7:41:35 UTC

Let's see if these work units are still credited. But I have the feeling that after calculating they will go straight into the trash can. Unfortunately, one computer will be running until this afternoon and will probably only produce garbage before I can separate it from Rosetta.
ID: 109073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109074 - Posted: 4 Apr 2024, 8:04:07 UTC - in response to Message 109073.  

Let's see if these work units are still credited.
If it's not Valid, there is no Credit.
Grant
Darwin NT
ID: 109074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109075 - Posted: 4 Apr 2024, 8:06:52 UTC - in response to Message 109072.  

Have i to stop the upload?
It will stop you from getting new work, but it is the only way to stop returned work from not Validating until the project fixes the issue.
They could also re-run the validation of the presently failed Tasks, but i don't like the odds of that actually happening.
Grant
Darwin NT
ID: 109075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,082,016
RAC: 12,014
Message 109077 - Posted: 4 Apr 2024, 9:50:31 UTC - in response to Message 109075.  
Last modified: 4 Apr 2024, 9:52:13 UTC

it will stop you from getting new work, but it is the only way to stop returned work from not Validating until the project fixes the issue.

The problem is that some of these wus are near the deadline.
It's a pity to throw away the work done.... (i don't care a lot about points, i care about science)
ID: 109077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109078 - Posted: 4 Apr 2024, 9:55:39 UTC - in response to Message 109077.  

The problem is that some of these wus are near the deadline.
It's a pity to throw away the work done.... (i don't care a lot about points, i care about science)
And it will stop you from being able to get new work from other projects as well.
So return them & get more work & hope the project fixes up all the failed Validation Tasks when they fix the Validation issue.

As it is, if they don't Validate then they won't (or they shouldn't) go into the science database. If it's not Valid, then it's not going to be of use to science. That's the whole point of Validation, otherwise it's just garbage in, and then garbage out.
Grant
Darwin NT
ID: 109078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,082,016
RAC: 12,014
Message 109079 - Posted: 4 Apr 2024, 11:18:47 UTC - in response to Message 109078.  

As it is, if they don't Validate then they won't (or they shouldn't) go into the science database. If it's not Valid, then it's not going to be of use to science. That's the whole point of Validation, otherwise it's just garbage in, and then garbage out.


That's the point!!
Thousands of wu wasted...
ID: 109079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 276
Credit: 523,512
RAC: 610
Message 109080 - Posted: 4 Apr 2024, 11:21:20 UTC

I don't want to recieve more posts about this unless it is a post from staff member that fixed it.
ID: 109080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,933,740
RAC: 17,353
Message 109081 - Posted: 4 Apr 2024, 12:04:38 UTC - in response to Message 109074.  
Last modified: 4 Apr 2024, 12:11:26 UTC

Let's see if these work units are still credited.
If it's not Valid, there is no Credit.

Rosetta does (or did) have an overnight scheduled job that sought out tasks that completed successfully but didn't validate in order to credit them anyway, but that clearly hasn't happened yet (overnight on the West Coast presumably).

Let me see if I can revive my contact to the project team, seeing as I haven't contacted them for a year.
Maybe they'e less fed up of me by now.

Edit: Email sent. We wait
ID: 109081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,082,016
RAC: 12,014
Message 109082 - Posted: 4 Apr 2024, 15:14:23 UTC - in response to Message 109080.  

unless it is a post from staff member that fixed it.


Staff member??
ID: 109082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 109083 - Posted: 4 Apr 2024, 22:23:29 UTC

28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.
ID: 109083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 274
Credit: 23,359,764
RAC: 5,784
Message 109084 - Posted: 4 Apr 2024, 22:57:07 UTC - in response to Message 109083.  

28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.


NO. That is the way Rosetta has chosen.

There should be a preference option that allows you to OPT OUT of the BETA work units. This is ESPECIALLY true if the project gives ZERO credit for the computing. About 25% of the BETA work units I am receiving run for several hours, finish without errors, and are marked INVALID as wasted work.

These INVALID results are a problem with the Rosetta BETA binary. Rosetta has chosen to run all the BETA units for hours instead of minutes. They could run the BETA binaries for minutes instead of hours until the BETA binaries have some successes.
ID: 109084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 412
Credit: 12,539,197
RAC: 13,745
Message 109085 - Posted: 5 Apr 2024, 0:04:02 UTC - in response to Message 109083.  

28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.


They might have beta in the name but these have been the production WUs for some time now.
ID: 109085 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,933,740
RAC: 17,353
Message 109086 - Posted: 5 Apr 2024, 3:05:05 UTC - in response to Message 109081.  

Let's see if these work units are still credited.
If it's not Valid, there is no Credit.

Rosetta does (or did) have an overnight scheduled job that sought out tasks that completed successfully but didn't validate in order to credit them anyway, but that clearly hasn't happened yet (overnight on the West Coast presumably).

Let me see if I can revive my contact to the project team, seeing as I haven't contacted them for a year.
Maybe they'e less fed up of me by now.

Edit: Email sent. We wait

Looking like they're still fed up with me... no response & no change I can notice
ID: 109086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 271 · 272 · 273 · 274 · 275 · 276 · 277 . . . 316 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org