Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 279 · 280 · 281 · 282 · 283 · 284 · 285 . . . 301 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2123 Credit: 41,206,290 RAC: 10,273 |
Taking 14-22hrs out of runtime goes a long way - in all likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run.Which is exactly what my advice does- it reduces the Runtime for each and every Task, for all projects that the person does. It doesn't just do it for one Project, but for all of them. In this case, you're assuming the problem when there's no evidence of it being the one you describe based on the symptom. A while back you rightly pointed out the symptom was one of scheduling. Solve the scheduling issue where Rosetta knowingly misleads Boinc, in the way I described - the end. If you were to take a look at adrianxw's Rosetta tasks (where tbf he only seems to be running Rosetta tasks atm which mislead Boinc in the other direction) and no deadlines are being missed any more so Panic mode won't be arising let alone missing deadlines. nor will they if his tasks are all Beta ones. You also ignore the fact that, with all cores usable to Boinc tasks, the Folding tasks are <additional> to those tasks. I don't know how many Folding tasks run at a time - I assume it's one - so the inefficiency you see in Boinc tasks is entirely taken up by a 9th task running at normal priority on an 8-core machine. How does that pan out? Neither of us know for sure, but I'm going to suggest that almost all of that "inefficiency" disappears by the processing of a 9th task on an 8 core machine. It's obviously true if you actually understand what is going on.But for anyone else that's been reading these posts...Well, that's obviously not true. Having cut off quoting the critical part of what I wrote about how the unutilised cores to Boinc are made use of, this isn't a statement on what I wrote but what I explicitly didn't write, so worthless, One final attempt to point out the obvious- On a different situation no-one's talking about... irrelevant. When someone raises that as their issue, bring it up again. Then it might have a point. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,602,729 RAC: 8,692 |
Still the same error, again and again ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
...time to end it. I have tried my best i to help you understand, but every point you make shows that you still don't understand what is happening, so it really is time for me to give up once and for all. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.Now up to 24,128, and the Server Staus showing several processes on boinc-process not running. Seems to be nothing but recurring issues with that server lately. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Now all processes on boinc-process are down and Waiting for Validation is now up to 35,496.Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.Now up to 24,128, and the Server Staus showing several processes on boinc-process not running. Maybe it's gone down in sympathy with the ralph server over on Ralph. It's been down for 4-5 days now. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 493,761 RAC: 940 |
Everything is running as of as of 5 Jun 2024, 10:16:46 UTC |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Everything is running as of as of 5 Jun 2024, 10:16:46 UTC10 minutes earlier everything on boinc-processes was dead. And the same with the ralph server at Ralph, it's showing life again as well. BTW- check the date time stamp- that's for the Task application data. The server status data is this one- Remote daemon status as of 5 Jun 2024, 10:45:06 UTC It would be good if these things were updated more often. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 493,761 RAC: 940 |
They probably rebooted it. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
They probably rebooted it.It'd be nice if they fixed whatever it was that keeps causing it to die so they don't need to keep rebooting it. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2123 Credit: 41,206,290 RAC: 10,273 |
They probably rebooted it.It'd be nice if they fixed whatever it was that keeps causing it to die so they don't need to keep rebooting it. It is very odd - it never used to happen. Anyway, glad it got sorted before too long and they didn't need a nudge this time seeing as I'm 2 days late in finding out |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
New work at Ralph, with new errors. So some work has been done, but looks like there's still quite a way to go. RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_d_pred_188_16900_2_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv1rf2aapredict.py", line 733, in <module> with zipfile.ZipFile(args.z) as z: File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1268, in __init__ self._RealGetContents() File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1335, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file </stderr_txt> ]]> RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_d_pred_60_16900_5_1 <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> The access code is invalid. (0xc) - exit code 12 (0xc)</message> <stderr_txt> 'C:ProgramDataBOINC/projects/ralph.bakerlab.orgev0Scriptsactivate.bat' is not recognized as an internal or external command, operable program or batch file. </stderr_txt> ]]> Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2123 Credit: 41,206,290 RAC: 10,273 |
Total queued jobs on the front page down to 222k Advance warning we may be out of new tasks in the next 24hrs unless we get lucky again. Fingers crossed. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Now out of work new. Also, although the Server status shows all green, there is a backlog of Tasks waiting on Validation. 3,078 at the moment. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,821,705 RAC: 22,765 |
Also, although the Server status shows all green, there is a backlog of Tasks waiting on Validation.Whatever was going on before, the backlog has now cleared. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2123 Credit: 41,206,290 RAC: 10,273 |
Now out of work new This has been the best run we've had for a couple of years - bound to end at some point once everyone's offline cache runs down. It's at this point my 12hr runtime setting ekes out my remaining work as far as possible. What I'd re-emphasise is that the default runtime for tasks has fallen to 3hrs for some reason, which I believe to be a mistake and contradicts the forced Boinc setting of 8hrs, As such, people should go into Boinc's Your Account option, select Rosetta@home preferences and change Target CPU run time to an explicit 8hrs rather than "not selected". This will almost treble how long tasks run and extend the life of work batches so that we run out less, if at all, while almost trebling the credit we get for tasks too. This should be considered a high priority for everyone imo. |
RDTSC Send message Joined: 29 Jan 24 Posts: 4 Credit: 738,712 RAC: 11,585 |
https://boinc.bakerlab.org/rosetta/ Their home page could do with some updates; last post almost two years ago. I get it, web hosting and administration is expensive, along with preparing, running, and maintaining massive job servers. It just seems to me that a little grease, at the right points of this machine, would greatly help it function. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 493,761 RAC: 940 |
Hal jobs run for three hours because subtasks are short and produce many results per task. Other jobs run for 8 hours. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2123 Credit: 41,206,290 RAC: 10,273 |
Hal jobs run for three hours because subtasks are short and produce many results per task. No. All mine run for 12hrs because I set them to run for 12hrs. They don't hit a top limit of decoys and end because some internal limit has been reached. Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs. So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'. Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins. The alternative is what we have now - no new tasks. Everyone loses. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 493,761 RAC: 940 |
tasks starting with RosettaVS run for 8 hours for me. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org