Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 294 · 295 · 296 · 297 · 298 · 299 · 300 . . . 302 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,941,438
RAC: 22,992
Message 109901 - Posted: 23 Oct 2024, 8:41:15 UTC

And the boinc-process host is down again.
Grant
Darwin NT
ID: 109901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 10
Credit: 101,720,667
RAC: 60,022
Message 109903 - Posted: 24 Oct 2024, 0:22:46 UTC
Last modified: 24 Oct 2024, 0:36:15 UTC

Have a work unit that doesn't seem to be getting as far as others, and has an unusually long model (the graphics shows a dot with a line that seems to go on into infinity)
Other Tasks are running as expected.



Application
Rosetta 4.20
Name
rb_09_09_632102_625918__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_2979545_8404
State
Running
Received
Saturday, October 19, 2024 at 03:24:01 AM
Report deadline
Tuesday, October 22, 2024 at 03:24:04 AM
Estimated computation size
80,000 GFLOPs
CPU time
2d 14:28:52
CPU time since checkpoint
2d 14:28:52
Elapsed time
2d 14:12:32
Estimated time remaining
---
Fraction done
100.000%
Virtual memory size
34.42 GB
Working set size
22.83 MB
Directory
slots/2
Process ID
17683
Progress rate
1.440% per hour
Executable
rosetta_4.20_x86_64-apple-darwin



This is stderr.txt
command: rosetta_4.20_x86_64-apple-darwin -run:protocol jd2_scripting @flags_rb_09_09_632102_625918__t000__0_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_09_09_632102_625918__t000__0_C1_robetta.zip -frag_weight_aligned 0.5 -max_registry_shift 4 -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3499362
Using database: database_357d5d93529_n_methyl/minirosetta_database
error:  zipfile probably corrupt (segmentation violation)
error:  zipfile probably corrupt (illegal instruction)
BOINC:: CPU time: 64841.5s, 36000s + 28800s[2024-10-21 22:25: 9:] :: BOINC 
Output exists: default.out.gz Size: WARNING! cannot get file size for default.out.gz: could not open file.
-1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
error:  zipfile probably corrupt (segmentation violation)

ID: 109903 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109904 - Posted: 24 Oct 2024, 1:43:18 UTC - in response to Message 109903.  

Have a work unit that doesn't seem to be getting as far as others, and has an unusually long model (the graphics shows a dot with a line that seems to go on into infinity)
Other Tasks are running as expected.

CPU time
2d 14:28:52
CPU time since checkpoint
2d 14:28:52
Elapsed time
2d 14:12:32
Estimated time remaining


This is stderr.txt
error:  zipfile probably corrupt (segmentation violation)
error:  zipfile probably corrupt (illegal instruction)
BOINC:: CPU time: 64841.5s, 36000s + 28800s[2024-10-21 22:25: 9:] :: BOINC 
-----
Stream information inconsistent.
Writing W_0000001
error:  zipfile probably corrupt (segmentation violation)

It's probably already errored out by now, but with all those errors and running over 2.5days without starting, you should abort it if it's still going.
It hasn't started, let alone stand any chance of finishing. Let your core have something more productive to run.
ID: 109904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 10
Credit: 101,720,667
RAC: 60,022
Message 109905 - Posted: 24 Oct 2024, 3:46:07 UTC - in response to Message 109904.  


It's probably already errored out by now, but with all those errors and running over 2.5days without starting, you should abort it if it's still going.
It hasn't started, let alone stand any chance of finishing. Let your core have something more productive to run.


Fortunately this seems to be a one-off and other tasks are processing as expected.
Restarting bionic client caused it to realize it needed to error out this task.
Maybe at some point bionic client will recognize similar errors (for any project) and avoid a restart or abort
ID: 109905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,941,438
RAC: 22,992
Message 109906 - Posted: 24 Oct 2024, 4:29:38 UTC - in response to Message 109901.  

And the boinc-process host is down again.
Still dead, so still no work being Validated.
Grant
Darwin NT
ID: 109906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 10
Credit: 101,720,667
RAC: 60,022
Message 109908 - Posted: 24 Oct 2024, 12:39:37 UTC - in response to Message 109875.  

Looks like Application "Rosetta Beta 6.06" tasks are using 2.5GB of RAM each! That becomes a bit inefficient when have 128 cores in a computer and 128GB RAM (only 46/128 cores used). Ones before that and "Rosetta 4.20" are consuming less than 0.5GB (and all 128 cores used).
Is it possible to limit the RAM usage per task, so can consume all cores again?

The recent beta 6.06 tasks are now using less than 1GB (600MB compressed). Thank you for fixing the RAM size!
Now I'm able to use all cores again
ID: 109908 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 36
Credit: 33,198,055
RAC: 32,750
Message 109909 - Posted: 24 Oct 2024, 17:07:01 UTC

It appears that they (whoever they are) have resolved the massive memory gobbling. Do you think I would be wise to remove the limitation on the beta runs? I currently have it limited to only 6 per computer.
ID: 109909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109912 - Posted: 25 Oct 2024, 4:15:41 UTC - in response to Message 109905.  


It's probably already errored out by now, but with all those errors and running over 2.5days without starting, you should abort it if it's still going.
It hasn't started, let alone stand any chance of finishing. Let your core have something more productive to run.

Fortunately this seems to be a one-off and other tasks are processing as expected.

I think so.
It's possible it ran short of RAM as some tasks are demanding high amounts recently, but better to think of it as a one-off and just move on.
ID: 109912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109913 - Posted: 25 Oct 2024, 4:20:23 UTC - in response to Message 109906.  

And the boinc-process host is down again.
Still dead, so still no work being Validated.

It came back about 8hrs ago.
Everything nearly cleared down now.
And some tasks became available, but have all been gobbled up again.
All very hand-to-mouth
ID: 109913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Matthew Tireman

Send message
Joined: 24 Mar 20
Posts: 6
Credit: 387,215
RAC: 517
Message 109929 - Posted: 27 Oct 2024, 16:24:38 UTC
Last modified: 27 Oct 2024, 16:25:21 UTC

:/
ID: 109929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Matthew Tireman

Send message
Joined: 24 Mar 20
Posts: 6
Credit: 387,215
RAC: 517
Message 109930 - Posted: 27 Oct 2024, 16:24:39 UTC

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?
ID: 109930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 109932 - Posted: 27 Oct 2024, 19:14:59 UTC - in response to Message 109930.  

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?


I tried to look up which of your systems that is in order to see if I could help. The information I found by clicking on your author name did not include the system type (phenom ii), only items like the CPU and GPU types, so I couldn't help.










9phenom ii
ID: 109932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109935 - Posted: 27 Oct 2024, 23:21:15 UTC - in response to Message 109932.  

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?

I tried to look up which of your systems that is in order to see if I could help. The information I found by clicking on your author name did not include the system type (phenom ii), only items like the CPU and GPU types, so I couldn't help.

It looks to be this one.
I can't help either.
Some tasks crashed with their wingman too, but others completed fully and successfully.

The only thing I might ask about is if that PC is overclocked or old and maybe overheating.
Might it need a clean-out of dust from fans and vents in order to run cooler? Can't do any harm.

But I'm guessing - I have no idea what's wrong.
And there's no way to disable Rosetta Beta tasks only.
If Matthew doesn't mind the wasted bandwidth, let them crash out in a few seconds and someone else will have a go at them while he moves on with other tasks that do run successfully.
ID: 109935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 36
Credit: 33,198,055
RAC: 32,750
Message 109936 - Posted: 28 Oct 2024, 3:18:57 UTC - in response to Message 109935.  

[
And there's no way to disable Rosetta Beta tasks only.


Ahh...but there is! At least under linux. Thanks to the beta jobs asking for 2+GB of memory I took the hint(s) and restricted them. But they've fixed that problem so it turned into a "learning experience" and I'm limiting the number of einstein@home jobs now. Details available via private message if anyone is interested in how.
ID: 109936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,941,438
RAC: 22,992
Message 109938 - Posted: 28 Oct 2024, 6:17:22 UTC

The only thing to try that comes to mind is to reset the Project.
If one of the data files needed for Beta Tasks has become corrupted, that can cause the problem you're experiencing. Resetting the project will release all downloaded work, and clear out all existing application & database files & re-download them from the project from scratch.
Grant
Darwin NT
ID: 109938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,941,438
RAC: 22,992
Message 109941 - Posted: 30 Oct 2024, 10:17:54 UTC

boinc-process host has died yet again...
Grant
Darwin NT
ID: 109941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,633,151
RAC: 7,242
Message 109942 - Posted: 30 Oct 2024, 16:13:06 UTC - in response to Message 109941.  

boinc-process host has died yet again...


I missed it a little
ID: 109942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
OffDutyTaoist

Send message
Joined: 10 Oct 06
Posts: 3
Credit: 1,988,103
RAC: 616
Message 109944 - Posted: 30 Oct 2024, 19:27:15 UTC

My Pixel 6 recently was having issues with Rosetta v4.20 arm-android-linux-gnu. Specifically:

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_05_2997716_34_0

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_12_2997716_33_0

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_07_2997716_32_0

When they started, it would get up to about ~1.5 to 1.75% completed and then reset my phone, and start over at 0%. I aborted all three, in retrospect I should have pause two and tried to isolate if one exactly causing the issue. But, I have some other stuff going on and acted out of frustration, so that one is on me. If I can provide anything else that might help, let me know.
ID: 109944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109946 - Posted: 30 Oct 2024, 23:32:11 UTC - in response to Message 109941.  

boinc-process host has died yet again...

Still down, but two batches of tasks issued and 1m+ queued up to process
ID: 109946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 109948 - Posted: 1 Nov 2024, 0:42:10 UTC - in response to Message 109946.  

boinc-process host has died yet again...

Still down, but two batches of tasks issued and 1m+ queued up to process

Still down, 400k awaiting validation now, but also the front page info seems to have frozen - no update for @18hrs while the Server Status page still seems ok. For now
ID: 109948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 294 · 295 · 296 · 297 · 298 · 299 · 300 . . . 302 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org