Message boards : Number crunching : Minirosetta 3.52
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Not sure whats goes on here.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713895694 I have my WU's set to 1 hour but this WU went for 5.1hrs. Is this behavior normal with some types of tasks? On a side note besides the long time is the small credit given instead of getting 5 times more credit as was asked for instead the WU recieved 5 times less credit. This certainly isn't the first time i've seen this happen either but thankfully only seems to happen a few times a day. So is it mormal? Any idea's? TIA |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Sorry my bad it's not the above link, it's this one.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713894281 @ P . P . L: I have many of those validate errors too but you still end up getting credited for them in the end. It's considered a normal part of the process. The way I understand it is these WU's are given credit by a script once every 24 hrs but it doesn't show up in your results in the normal spot. If you wait 24/48hrs then click the task details link you'll see right down the very bottom that they did get credited eventually after a day or two. Like this one of yours.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713405534 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Sorry my bad it's not the above link, it's this one.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713894281 At the end of the log it looks like the watchdog had to force the task to shut down - the watchdog is a kind of fail-safe if something goes wrong with the task and it doesn't complete properly. It happens very occasionally. Looks like you were unlucky with that one. |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Thanks for answering Sid. Your right luckily there only seems to be about 4 or 5 a day but it adds up. Especially when for some reason they only get credited with a fraction of what they should. Also it's hard to tell exactly how many there are as I would have to search through 100's of tasks each day to find them. Rosettas task viewing leaves much to be desired. As you'd know on other projects you can click on say errors & get a list of errors or do a search for a particular task name, that would be a big help here. PS I've just noticed that out of the blue i'm having comp errors like this https://boinc.bakerlab.org/rosetta/result.php?resultid=714075014 on one of my PC's. It's the same exact error as described in this post above. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6444&nowrap=true#76816 So far i've had more than a dozen of these over the last 4hrs or so the strange thing is many of them end up getting validated by the next guy along. Not sure if that makes any difference or not but in most cases where it does gets validated it was by someone using Linux . Just a thought? I'll keep digging into it. Thanks again.. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Thanks for answering Sid. Your right luckily there only seems to be about 4 or 5 a day but it adds up. Especially when for some reason they only get credited with a fraction of what they should. Also it's hard to tell exactly how many there are as I would have to search through 100's of tasks each day to find them. You're right on that last bit about sorting tasks by error etc. 4 or 5 a day sounds a lot, so I just had a look at your machines and tasks and... holy moly! Why is it you have really fast computers but you run with just a 1hr runtime? You have near 1000 tasks per machine either complete or in progress! That must be taking up massive band width at both ends. In the context of 1000s, the occasional few tasks going wrong is trivial. I thought 4-5 would be a lot. |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Hi Sid, Thank you again for your reply. The reason I use 1hr is simply because it credits the most, or at least it certainly seems too. While credits are no where near the top of my list for crunching they are like for most others a side interest. On different occasions I checked other PC's with similar rigs to mine & those I checked on that were using larger times than me on average never got got equal to what I was getting. And I figure since i'm helping anyway what does it matter? After all if using the 1hr option was bad why would it still be an option? If it was an imperative to get crunchers to use longer times then there would be an advantage for them to do so, at the moment there isn't. A simple way to achieve this if it is indeed a project necessity would be to offer crunchers a bonus for crunching longer times for the extra risk & commitment involved in doing so. Things go pear shaped here more than most other projects. Other projects give bonuses for quick return & doing long tasks so it could be done here too. Plus The extra overhead at my end seems negligible when running larger size units. Although I didn't do a thorough check when last I used the longer times so I could be wrong about it. Of course if it became a necessity for the projects good then I would oblige happily. With the errors & problems I had well if i'm having them then there could well be who knows how many others with such errors so I thought they would be worth reporting, especially since the majority of crunchers don't use the forums at all & when they find to many errors will just move on. Crunch-on Cheers Greg. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Short runtimes simply report and claim credit sooner. The level of runtime and network overhead is reduced by longer runtimes. Credit is very hard to compare, as different tasks can have different performance characteristics. There is some level of overhead just opening up the zip files and reference data that is used by a task, so the less times you do that in a day, the less overhead in the processing. Longer runtimes should be a smidge more efficient. Also reduces the number of tasks on your pending and completed lists, and reduces the number of hits to the project servers. I don't think anyone intended to imply a long runtime was "imperative". Just that it may offer some benefits for you by reducing the overall number of tasks, disk space requirements, etc. The underlying work results are the same, so there is no premium either way for return time nor run length. The choice is there to help adapt to various usage scenarios. BEWARE, changes to runtime preference will effect tasks currently on your machine and BOINC has to crunch a few with the new runtime preference before it accurately factors it in to it's future work requests. So ideally you reduce your work buffer, and change runtime preference gradually over the course of a week. Then bump the buffer of work back up as desired. Also, there currently seems to be an issue with the 2 day preference, so I suggest using 1 day. Rosetta Moderator: Mod.Sense |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
No worries mod sense i'll give that some thought & also try some longer timed WU's later & see what there like now. Thanks for your time. Greg |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Hi Sid, I didn't mean to make a big deal about bandwidth (though it is a side issue). I just meant it was such a chore to work through your task lists to see what issue you were having. 4 or 5 errors is a lot with the default 6hr runtimes, but at 1hr (with up to 16 cores running at a time) that's 4/5 out of 384 tasks a day, not 48. For what it's worth, I did see someone experiment with different runtimes on a machine and the differences were barely perceptible, with just the slightest advantage to longer runtimes (nothing conclusive either way though). On the back of that, also with the bandwidth usage in mind, I admit, I decided to change from the default to 8hrs, but it's completely down to you. If you're looking to maximise credit, I guess it's worth bearing in mind that if you have a rogue task, like the one you first reported, instead of over-running by 4hrs on a 1hr task (watchdog cuts in at runtime +4hrs) you lose 5 tasks worth of processing, whereas a default 6hr task will run for 10hrs, only losing 1.67 tasks worth of processing. This is very much splitting hairs though. Whatever suits you. As mod.sense says, don't make a dramatic change. Either run down tasks first before switching andor only change runtime by one step at a time. If 1000 tasks at 1hr suddenly became 1000 at 6hrs you'd have a problem! |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Hi Sid, It was me who had the wrong slant on things probably from skimming posts that I read before I read yours. Thanks again for your help I hope I can return the favour some day. Cheers Greg |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit. |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit. I found the relevant BOINC event log entries by digging into the appropriate BOINC data directory. By default, this file is located at C:ProgramDataBOINCstdoutdae.old in Windows 7. The BOINC event log entries are listed below. 02-Feb-2015 13:14:03 [rosetta@home] Computation for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 finished 02-Feb-2015 13:14:03 [rosetta@home] Output file A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0_0 for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 exceeds size limit. 02-Feb-2015 13:14:03 [rosetta@home] File size: 65833683.000000 bytes. Limit: 50000000.000000 bytes I therefore will have to change my preferences to 12 hour work units to prevent this error once my current work units drain out unless the file upload size limit is raised. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit. Blimey! That's a new one! I've never come across an output file that big and I never knew there was a limit to the filesize either. |
alvin Send message Joined: 19 Jul 15 Posts: 5 Credit: 6,550,555 RAC: 0 |
Do you do GPU crunching or plan to? Do you support NVidia and/or ATI? |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Short answer to both NO. |
Sandman192 Send message Joined: 22 Sep 07 Posts: 16 Credit: 2,018,819 RAC: 0 |
Ever since Rosetta added a change so you can change "Target CPU run time"=TCPU I have stopped getting work. I change to 2-hours TCPU and still got no work. I change it to a day and a half and no work. I get this message. 10/8/2020 8:23:22 PM | Rosetta@home | Sending scheduler request: To fetch work. I have 2 computers giving me this I call err. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
Ever since Rosetta added a change so you can change "Target CPU run time"=TCPU I have stopped getting work. I change to 2-hours TCPU and still got no work. I change it to a day and a half and no work.With the number of projects you are running, and the low priority for Rosetta work, your best chance of getting work is to set the Target CPU time to 2 hours and to set your cache to 0 days. Computing preferences, other, Store at least 0.00 days of work Store up to an additional 0.01 days of work Save the changes, then Update on the BOINC Manager of the computers for them to get those changes. After a while (it could be several hours depending on the work you presently have running- especially for the Q8600 system) they should start to get some Rosetta work occasionally. Increase the Resource share value for Rosetta if you wan them to do more Rosetta work. Grant Darwin NT |
Sandman192 Send message Joined: 22 Sep 07 Posts: 16 Credit: 2,018,819 RAC: 0 |
With the number of projects you are running, and the low priority for Rosetta work, Rosetta is to low priority??? It's set at 1000%. How is that low??? Why should I set "Store at least 0.00 days and additional 0.01 days or work"? Maybe I want 10 days of work for both. And I want to get a day and a half of work. What's the point of giving these options if you can't use them? Are you saying I can't use an option that BOINC and Rosetta gives me to use? I have no problems with Prime when it came to a WU that took a week to finish with 100% Priority. If prime can do it then Rosetta needs to fix it the same way. Sounds like a bug somewhere. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
Rosetta is to low priority??? It's set at 1000%. How is that low???It is not 1000%. Is is a ratio, compared to the values you have set for the other projects. And being connected to Word Community Grid will result in odd behaviour as they don't honour Resource share settings in the same way as all other BOINC projects do. Why should I set "Store at least 0.00 days and additional 0.01 days or work"?So you can get more Rosetta work as they have 3 day deadlines. That was why you posted wasn't it- you wanted more Rosetta work? If not, then just leave things as they are. Maybe I want 10 days of work for both. And I want to get a day and a half of work. What's the point of giving these options if you can't use them?The point is they can be used, when appropriate. Just because you can do something doesn't mean you should. There is no need for 10 days work if you are connected to multiple projects. You will never run out of work so there is no need for any cache at all. That was the only reason for the cache settings- back in the days of dialup people didn't have 24/7 internet access. Are you saying I can't use an option that BOINC and Rosetta gives me to use?No, what i am saying is that it make no sense to use something, when there is no need to use it & especially so when making use of it, it will impact on what you are actually trying to do. There are a lot of settings you can change- and many of them will act against each other.- eg setting a large cache, when a project has short deadlines will impact on your ability to get work for that project when you are working on multiple other projects with long deadlines. Just because you can do something doesn't mean you should. I have no problems with Prime when it came to a WU that took a week to finish with 100% Priority.Different projects, different deadlines. If you want to do multiple projects, then you need to use settings that make it possible. Many of the settings are of use with only a single project, of limited use with a couple of projects, and of no use with multiple projects- such as you are doing. Sounds like a bug somewhere.No bug, just you using values suitable for a single project while running lots of multiple projects all with differing deadlines. Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Maybe I want 10 days of work for bothYou do not want 10 days of work for Rosetta@home. Task deadlines are always 3 days from when they are delivered. If you ask for 10 days’ worth of work, you might get it – but stand no chance of completing more than 30% of it. All that will do is delay the analysis of the results of the other 70%, as researchers have to wait at least an extra 3 days for the server to resend the tasks to hosts that will actually do the work on time. |
Message boards :
Number crunching :
Minirosetta 3.52
©2024 University of Washington
https://www.bakerlab.org