Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx
Author | Message |
---|---|
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
I've had ~20 of these tasks fail after 8 hours of computation time: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_04_900260_6_0. Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1124284684 I've aborted the others in my que. linux 3900x processor 64 GB RAM |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
Yes, it is an "old" issue here, the wus containing "cstwt_5.0_FT" are prone to fail often in Linux, better performance in windows. They overpass the computing time set in user preferences and either finish ok through the watchdog or got a "signal 11" and fail to validate. However, it is not deterministic, some batches complete almost ok, other fail almost entirely. The units containing just "cstwt_5.0" complete ok. |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
Looking to my running tasks I see over 100 units of this type. Let's see what happen, most of them have already gone beyond the 8 hours processing time. |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
30 out of them have failed with "signal 11". Something to be checked by investigators. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,603,316 RAC: 8,722 |
Something to be checked by investigators. Waiting for Godot |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
So, 30 failed units of this type on 29/02, 20 units on 01/03, 26 units on 02/03 and 20 units so far today. Tomorrow will be less as I've moved hosts but one to other projects. Let's hope it is solved or explained when I come back to crunch again with more resources. |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
I'm on Linux, BOINC got freezed because of task rb_02_21_16595_16419_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_05_896595_65_1. After 8 hours, I found that my host is at idle! This is very bad. Please check it, it's not acceptable that a task blocks crunching on all DC projects. Sadly, I set no more work on R@H. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,603,316 RAC: 8,722 |
Please check it, it's not acceptable that a task blocks crunching on all DC projects. Do you see admins here? Do you see news about code? |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Well, I see that Admin answers on Number Crunching threads. As volunteer, I can spend my time to arrange a solution to abort all tasks named like "*cstwt*". |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,603,316 RAC: 8,722 |
Well, I see that Admin answers on Number Crunching threads. Do you mean Mod.Sense? He is a great guy, but he is NOT an admin. Admin posts only "Predictor of the day" and "News". David E.K. - latest post is March 2019. David Baker - latest post is Decembre 2017. If you read forums, the "cstwt_5.0" wus has problems since February 2019. |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Do you mean Mod.Sense? He is a great guy, but he is NOT an admin.Here he is. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510&postid=91696#91696 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510&postid=91703#91703 If you read forums, the "cstwt_5.0" wus has problems since February 2019.I see. I think I have already encountered this issue, but I didn't remember it at all. BOINC client stops to respond and you can't even kill it. Although my client is standalone and user's process ( not a service), you have to kill as superuser. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,603,316 RAC: 8,722 |
Admin posts only "Predictor of the day" and "News". These are not news about bugs. One is a news, other is an info about ram usage. I don't know who is the developer working on bugs, but here all seems freezed |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
I'm going to abort all *cstwt_5.0* tasks by bash on Linux to guarantee my contribution to R@H. Here it is my script: https://pastebin.com/RKdZKhGx |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
On Ubuntu 18.04 there are no problems to run *cstwt_5.0* tasks. Ubuntu 18.04.4 LTS, kernel 4.15.0-88-generic BOINC v7.9.3 OK Ubuntu 14.04.6 LTS, kernel 4.4.0-142-generic BOINC v7.2.42 Dangerous |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Errors on Ubuntu 18.04 too. BOINC didn't crash though. rb_02_23_16774_16587_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_07_899045_59_1 rb_02_24_16778_16590_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_07_899052_43_1 |
[VENETO] sabayonino Send message Joined: 16 Mar 10 Posts: 2 Credit: 3,820,525 RAC: 708 |
Same here rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_09_900260_85_0 and others |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 51 |
4 failures so far today, rb_03_16_18638_18457_ab_t000__h002_robetta_IGNORE_THE_REST_12_10_902207_18_0 rb_03_16_18636_18455_ab_t000__h001_robetta_IGNORE_THE_REST_10_13_902209_11_0 rb_03_16_18636_18455_ab_t000__h002_robetta_IGNORE_THE_REST_05_15_902210_2_0 rb_03_16_18637_18451_ab_t000__h002_robetta_IGNORE_THE_REST_09_19_902203_14_0 No mention of th cstwt_5.0_FT there. Windows 8.1 x64. <edit> 5 now... rb_03_16_18639_18459_ab_t000__h002_robetta_IGNORE_THE_REST_05_15_902222_16_0 </edit> Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 51 |
Another of these, title is a bit different though. Was there not a recent server upgrade? 9v1nm_gb_c3143_9mer_gb_001352_SAVE_ALL_OUT_892356_222_0 Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 51 |
<duplicate> Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 51 |
9v1nm_gb_c3143_9mer_gb_001352_SAVE_ALL_OUT_892356_222_0 Similar to the last one I mentioned. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Message boards :
Number crunching :
Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx
©2024 University of Washington
https://www.bakerlab.org