Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 294 · 295 · 296 · 297 · 298 · 299 · 300 . . . 313 · Next

AuthorMessage
Profile Matthew Tireman

Send message
Joined: 24 Mar 20
Posts: 6
Credit: 387,215
RAC: 4
Message 109930 - Posted: 27 Oct 2024, 16:24:39 UTC

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?
ID: 109930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 292
Message 109932 - Posted: 27 Oct 2024, 19:14:59 UTC - in response to Message 109930.  

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?


I tried to look up which of your systems that is in order to see if I could help. The information I found by clicking on your author name did not include the system type (phenom ii), only items like the CPU and GPU types, so I couldn't help.










9phenom ii
ID: 109932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109935 - Posted: 27 Oct 2024, 23:21:15 UTC - in response to Message 109932.  

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

Ive:
Reinstalled boinc
Enabled virtiualization
Reinstalled virtualbox twice
If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine?

I tried to look up which of your systems that is in order to see if I could help. The information I found by clicking on your author name did not include the system type (phenom ii), only items like the CPU and GPU types, so I couldn't help.

It looks to be this one.
I can't help either.
Some tasks crashed with their wingman too, but others completed fully and successfully.

The only thing I might ask about is if that PC is overclocked or old and maybe overheating.
Might it need a clean-out of dust from fans and vents in order to run cooler? Can't do any harm.

But I'm guessing - I have no idea what's wrong.
And there's no way to disable Rosetta Beta tasks only.
If Matthew doesn't mind the wasted bandwidth, let them crash out in a few seconds and someone else will have a go at them while he moves on with other tasks that do run successfully.
ID: 109935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 43
Credit: 35,474,078
RAC: 33,162
Message 109936 - Posted: 28 Oct 2024, 3:18:57 UTC - in response to Message 109935.  

[
And there's no way to disable Rosetta Beta tasks only.


Ahh...but there is! At least under linux. Thanks to the beta jobs asking for 2+GB of memory I took the hint(s) and restricted them. But they've fixed that problem so it turned into a "learning experience" and I'm limiting the number of einstein@home jobs now. Details available via private message if anyone is interested in how.
ID: 109936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109938 - Posted: 28 Oct 2024, 6:17:22 UTC

The only thing to try that comes to mind is to reset the Project.
If one of the data files needed for Beta Tasks has become corrupted, that can cause the problem you're experiencing. Resetting the project will release all downloaded work, and clear out all existing application & database files & re-download them from the project from scratch.
Grant
Darwin NT
ID: 109938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109941 - Posted: 30 Oct 2024, 10:17:54 UTC

boinc-process host has died yet again...
Grant
Darwin NT
ID: 109941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2014
Credit: 9,842,981
RAC: 4,009
Message 109942 - Posted: 30 Oct 2024, 16:13:06 UTC - in response to Message 109941.  

boinc-process host has died yet again...


I missed it a little
ID: 109942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
OffDutyTaoist

Send message
Joined: 10 Oct 06
Posts: 3
Credit: 1,998,088
RAC: 24
Message 109944 - Posted: 30 Oct 2024, 19:27:15 UTC

My Pixel 6 recently was having issues with Rosetta v4.20 arm-android-linux-gnu. Specifically:

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_05_2997716_34_0

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_12_2997716_33_0

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_07_2997716_32_0

When they started, it would get up to about ~1.5 to 1.75% completed and then reset my phone, and start over at 0%. I aborted all three, in retrospect I should have pause two and tried to isolate if one exactly causing the issue. But, I have some other stuff going on and acted out of frustration, so that one is on me. If I can provide anything else that might help, let me know.
ID: 109944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109946 - Posted: 30 Oct 2024, 23:32:11 UTC - in response to Message 109941.  

boinc-process host has died yet again...

Still down, but two batches of tasks issued and 1m+ queued up to process
ID: 109946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109948 - Posted: 1 Nov 2024, 0:42:10 UTC - in response to Message 109946.  

boinc-process host has died yet again...

Still down, but two batches of tasks issued and 1m+ queued up to process

Still down, 400k awaiting validation now, but also the front page info seems to have frozen - no update for @18hrs while the Server Status page still seems ok. For now
ID: 109948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 11
Credit: 103,504,587
RAC: 18,910
Message 109949 - Posted: 1 Nov 2024, 4:44:13 UTC - in response to Message 109930.  

One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

I'm seeing similar with my older 64bit system (Beta 6.06 tasks fail in 1 second without providing output, but all 4.20 tasks complete as expected - "Reset project" didn't help)
"
27-Oct-2018 17:57:12 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [Family 15 Model 75 Stepping 2]
27-Oct-2018 17:57:12 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow re
p_good nopl pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall
27-Oct-2018 17:57:12 [---] OS: Linux: 4.4.0-138-generic
"

"
Application
Rosetta Beta 6.06
Name
8aahal_r_hal_8aa_3jp5416_d40_1_0001_1_SAVE_ALL_OUT_2999122_54
State
Computation error
Received
Fri 01 Nov 2024 12:26:18 AM EDT
Report deadline
Sun 03 Nov 2024 11:26:18 PM EST
Estimated computation size
80,000 GFLOPs
CPU time
00:00:00
Elapsed time
00:00:01
Executable
rosetta_beta_6.06_x86_64-pc-linux-gnu
"

For some reason not able to grab stderr.txt in time. Is there something else to look at to find out why the failures?
ID: 109949 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109951 - Posted: 1 Nov 2024, 5:46:01 UTC - in response to Message 109948.  

boinc-process host has died yet again...

Still down, but two batches of tasks issued and 1m+ queued up to process

Still down, 400k awaiting validation now, but also the front page info seems to have frozen - no update for @18hrs while the Server Status page still seems ok. For now
Almost half a million waiting for Validation now.
Grant
Darwin NT
ID: 109951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 11
Credit: 103,504,587
RAC: 18,910
Message 109952 - Posted: 1 Nov 2024, 7:25:52 UTC - in response to Message 109949.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=1587071539

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.06_x86_64-pc-linux-gnu @8aahal_r_hal_8aa_3jp5416_d40_1_0001_1.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_f5ae1de8e1/database

</stderr_txt>
]]>


One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks.

It almost immediately fails the tasks.

I'm seeing similar with my older 64bit system (Beta 6.06 tasks fail in 1 second without providing output, but all 4.20 tasks complete as expected - "Reset project" didn't help)
"
27-Oct-2018 17:57:12 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [Family 15 Model 75 Stepping 2]
27-Oct-2018 17:57:12 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow re
p_good nopl pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall
27-Oct-2018 17:57:12 [---] OS: Linux: 4.4.0-138-generic
"

"
Application
Rosetta Beta 6.06
Name
8aahal_r_hal_8aa_3jp5416_d40_1_0001_1_SAVE_ALL_OUT_2999122_54
State
Computation error
Received
Fri 01 Nov 2024 12:26:18 AM EDT
Report deadline
Sun 03 Nov 2024 11:26:18 PM EST
Estimated computation size
80,000 GFLOPs
CPU time
00:00:00
Elapsed time
00:00:01
Executable
rosetta_beta_6.06_x86_64-pc-linux-gnu
"

For some reason not able to grab stderr.txt in time. Is there something else to look at to find out why the failures?

ID: 109952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109953 - Posted: 1 Nov 2024, 8:07:30 UTC - in response to Message 109952.  

From a previous thread
Under Linux, signal 11 means that the program tried to execute something that was not marked as executable code. The project administrators should use the dump to determine where the program got the address of what it was trying to execute, and then trace backwards from there.
Other than running the latest kernel and/or version of your distribution (or an earlier one if the latest ones have depreciated your older CPU) i can't think of anything else to try.
Even if someone has a similar system with Windows on it & seeing if that application has the same issue on the same hardware as well or not, since they're no longer doing any development work on this application i don't see anything happening to resolve the issue.
Grant
Darwin NT
ID: 109953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tgbauer

Send message
Joined: 5 Jan 06
Posts: 11
Credit: 103,504,587
RAC: 18,910
Message 109954 - Posted: 1 Nov 2024, 8:21:22 UTC - in response to Message 109953.  

From a previous thread
Under Linux, signal 11 means that the program tried to execute something that was not marked as executable code. The project administrators should use the dump to determine where the program got the address of what it was trying to execute, and then trace backwards from there.
Other than running the latest kernel and/or version of your distribution (or an earlier one if the latest ones have depreciated your older CPU) i can't think of anything else to try.
Even if someone has a similar system with Windows on it & seeing if that application has the same issue on the same hardware as well or not, since they're no longer doing any development work on this application i don't see anything happening to resolve the issue.

Looks like might be the lack of SSSE issue that was around in 4.08: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13658&postid=92557#92557
ID: 109954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109955 - Posted: 1 Nov 2024, 10:24:47 UTC - in response to Message 109954.  

Looks like might be the lack of SSSE issue that was around in 4.08: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13658&postid=92557#92557
Very possible.
The Beta application was developed long after the Rosetta application, very possibly by a different developer & they decided SSSE instructions would be the minimum supported (so no support for CPUs a bit over 15 years old at the time of the Rosetta Beta application release).
Grant
Darwin NT
ID: 109955 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109956 - Posted: 1 Nov 2024, 10:26:09 UTC

I really wish they'd replace that boinc-process host (or at the very least restart it, yet again).
Grant
Darwin NT
ID: 109956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109957 - Posted: 1 Nov 2024, 21:39:24 UTC

635k waiting for Validation and rising. Will we make it to 1 million?
Grant
Darwin NT
ID: 109957 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109958 - Posted: 2 Nov 2024, 1:52:54 UTC - in response to Message 109957.  
Last modified: 2 Nov 2024, 1:57:51 UTC

635k waiting for Validation and rising. Will we make it to 1 million?

Currently showing 663,306 and I was going to suggest we keep some kind of tally to see how high we can get...

...except, I've just looked and all servers are now showing as running on the server status page, so let's see if that starts reducing or whether it's a false reading.
The front page is still showing as frozen for some reason.

On the plus side, this is a particularly consistent run of work over the last week or so. Let's see what kind of a points boost we all eventually get.

All fun and games...

Edit: I've just checked and across my whole team I have 120 tasks pending validation, but as I scrolled through there are definitely one or two tasks that've now received credit, so I think validation is definitely starting to work through the very long queue. Boinc-process lives
ID: 109958 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109959 - Posted: 2 Nov 2024, 2:16:37 UTC - in response to Message 109958.  

Boinc-process lives
Till the next time.

It'd be nice if they got the main page Server Status info updating again, but if it's one or the other then it's better having the Validators running while there is work available.
Grant
Darwin NT
ID: 109959 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 294 · 295 · 296 · 297 · 298 · 299 · 300 . . . 313 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org