Message boards : Number crunching : No "finished" file
Author | Message |
---|---|
Ed Machak Send message Joined: 10 Nov 16 Posts: 7 Credit: 17,339,411 RAC: 0 |
For some time now the following types of message have shown up in the event log. 2/11/2020 7:47:32 PM | Rosetta@home | Task ennist_dLHa_0001_0001_0008_loop_0001_0001_fold_SAVE_ALL_OUT_891273_110_0 exited with zero status but no 'finished' file 2/11/2020 7:47:32 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. I have reset the project but these messages persist. Can this error report be ignored or is it indicating I'll get no credit for the work? Ed Machak |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,071,270 RAC: 16,685 |
This error have been here for years now. Happens from time to time. No clear ways to fix it. No need to do full reset of the project. Simple BOINC restart (not just manager aka GUI, but full restart) or computer reboot fix it too. But it will return again after some time. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
This error have been here for years now. Agree this error is a pain and means to me the task will take longer to process. Frequency of error seems to depend on the type of file being crunched. I only see it when using the Rosetta Mini v3.78 application. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
2/11/2020 7:47:32 PM | Rosetta@home | Task ennist_dLHa_0001_0001_0008_loop_0001_0001_fold_SAVE_ALL_OUT_891273_110_0 exited with zero status but no 'finished' file I seem to recall that is the one where the disk drive can not write the results fast enough to be available when needed (or that may be a different error). A write cache or an SSD could help. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,071,270 RAC: 16,685 |
Yes it somehow related to disk speed and occurs on SSDs much less frequently, but it still occurs sometimes even on SSDs. On HDD + lot of concurrent R@H WUs running it happens much often. Looks like root of the problem is a really old bug somewhere in Rosetta software which cause app to crash if it can not write to disk immediately, instead of just waiting a few seconds while disk is busy by handling other requests. But devs do not bother to track it and fix so it keeps crashing the app and wasting generated result for years now. Moving data to SSDs, enable disk write cache, reducing max_concurrent tasks running, etc - all is just partial workarounds(it helps mitigate problems, but not 100%), it does not fix the problem itself. |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
Yes it somehow related to disk speed and occurs on SSDs much less frequently, but it still occurs sometimes even on SSDs. I had in mind that the above recommendations were related to other classical BOINC error the "finish file present too long" more prone to occur in host with many core/threads. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I had in mind that the above recommendations were related to other classical BOINC error the "finish file present too long" more prone to occur in host with many core/threads. That could be it, or it may be both. SSDs are not always fast. They usually write quickly, but sometimes a write occurs while they are trying to do garbage collection or consolidate blocks or whatever they do. In those cases, the writes can be delayed for a half-second or so (you sometimes can see them pause a desktop app). But I use a write-cache on all of my machines. That was originally for protecting the SSDs from the high write-rates of some projects, but it happens to solve a variety of other problems too. In Windows, I use the Samsung Magician utility that includes a small cache (about 1 GB), or else I use PrimoCache for larger ones with longer write-delays. Linux includes its own cache, I just use the commands to increase it in size and duration, usually to at least 4 GB and 30 minutes. I don't recall seeing either of those errors for a while. |
Message boards :
Number crunching :
No "finished" file
©2024 University of Washington
https://www.bakerlab.org