Message boards : Number crunching : Rosetta needs 6675.72 MB RAM: is the restriction really needed?
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
05-May-2021 07:48:46 [Rosetta@home] Scheduler request completed: got 0 new tasksOn computers with 8 GB of RAM + 8 GB of swap space. I know... On my 8Gb laptop, I've amended the amount of RAM allocated within Boinc (Options/Computing preferences Disk & Memory tab) as follows and work comes down ok. When Computer is in use, use at most 75% When Computer is not in use, use at most 99% I don't believe it's the amount of RAM you have, but how much of it is allocated to Boinc in that setting. An 8Gb machine should work with all the tasks currently being issued |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
some new messages from the server till the 30th of April ... I haven't been in any contact with anyone again, but it seems various changes promised have come through in the way they see fit. From examining my current client_state.xml file for the tasks I have, they show the following RAM & Disk req'ts: rb_05_09_74941_72921_ab_t000__robetta_cstwt_5.0_FT RAM 1908Mb Disk 3815Mb miniprotein_relax11 RAM 3338Mb Disk 3815Mb jgSP_01 RAM 3338Mb Disk 3815Mb rb_05_09_74951_72937_ab_t000__h001_robetta RAM 3338Mb Disk 3815Mb pre_helical_bundles RAM 6676Mb Disk 8583Mb sap_h15_l3_h12_l1_h9_l2 RAM 8583Mb Disk 1908Mb rb_05_09_74865_72931__t000__ab_robetta RAM 8583Mb Disk 3815Mb I've ranked the task-types in order of RAM demand followed by disk-demand At the bottom end, some tasks are asking for slightly less than 2Gb - maybe not sufficiently low for some 2Gb hosts to run on, depending how they're set up, but certainly small enough for 4Gb hosts. And at the top end, some even requiring more than seen before - up from 6.676Gb to 8.583Gb, though with small disk demands. Hopefully some people with more constrained hosts are seeing something coming through. Not quite back to how it was, but close, while more capable machines are getting tasks commensurate with their greater capacity. And going back to the proxy I'm using for downloadability again - In Progress tasks Pre increase in RAM & Disk req'ts - 550k IP This figure went up to 431k, then dropped to 380k when we had problems last week Now back up to 432k - 21.5% below max, 35.8% above min |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
I haven't been in any contact with anyone again, but it seems various changes promised have come through in the way they see fit. Hmm... not sure if my cache is representative, but it's pretty much all "pre_helical_bundles" tasks demanding a lot of RAM again and in progress tasks have plummeted to 382k as a result. I haven't got a clue what's going on now tbh <sigh> |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
Hmm... not sure if my cache is representative, but it's pretty much all "pre_helical_bundles" tasks demanding a lot of RAM again and in progress tasks have plummeted to 382k as a result.I'm pretty sure the "pre_helical_bundles" were the ones that first got the larger configuration values, at the time 20 million Jobs were released. It's going to take a while yet for the rest of those Tasks with the excessive values to clear out of the system. I think we worked it out as around 2 months- but that was if they were the only ones being processed. With some other Tasks coming through & being processed, that means it will take longer still for the mis-configured ones to finally clear from the system completely. Grant Darwin NT |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 6,285,104 RAC: 14,873 |
Hi Grant May I ask you a quick question about this? My 4 core laptop (no SMT) is attempting to run 4 pre_helical_bundles tasks, but one is shown as "Waiting for memory". I have set RAM usage to 95% in Computing preferences (whether or not in use) and the system monitor (in Linux Mint) shows that I am using only 2.6GiB of 7.6GiB (34.3%) memory. So I am puzzled as to why the 4th task is not running. The system monitor also says "Cache 4.6GiB". Is that counting against the 95% limit? I tried a 99% limit but that made no difference. Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong! Mark |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong!Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements. Unless you've actually got enough RAM that is free that meets those requirements, then BOINC won't let one of those Tasks run until the RAM/Disk available requirements are met (even though the actual usage values are only a fraction of the required values). Same for the disk space requirements. If you set your BOINC Manager to Advanced view & look at Tools, Event log, you should see some messages there relating to how much RAM you have, and how much RAM BOINC thinks it will need in order to run the Task when it tries to get more work, or start the paused Task. Grant Darwin NT |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 6,285,104 RAC: 14,873 |
Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements. Thank you, that's really helpful. Mark |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong!Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements. I'm not sure if we can say any more that these pre_helical_bundle tasks were misconfigured. Obviously a lot have been worked through and there was a lot to start with but the total queued to run is down at 14m now and it may be that this is what was intended, with some other newer tasks even demanding more RAM. When other task-types get exhausted, it doesn't seem like it's a whole 2 months until more come through - it's been a week or less (no idea how or why) I've taken a glance at my tasks while I'm currently away and new different ones have just started coming down and they must require less RAM as In Progress tasks have quickly shot up by 60k to nearly 440k. I've given up trying to understand it from afar. It is what it is. What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised. When it happens to me on my laptop, I set No New Tasks, suspend all unstarted tasks, then as each running task ends, more RAM becomes available. At the point there's enough for the problem task to start, I then unsuspend one task at a time, then find all my cores can run again. I know it's a faff, but it's the only way I've found to get around it. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised.It's not a case of how much is needed to start or to actually run (the tasks that can use up to 4GB of RAM only need several hundred MB or so when they first start)- it's just about how much the maximum they claim they will need is. The problem is the configuration value that says xxGB is required (even though it isn't). As you've noted-if lower configured RAM size Tasks are already running, then one that claims it needs huge amounts of RAM can't start due to the RAM already in use. But if that large configured requirement RAM Task is already running, then Tasks that say they require much, much less can start up without issue. But yes- it is very similar to the days of DOS & Config.sys and spending hours changing the order that commands were loaded up in order to allow those that needed heaps of RAM to begin, but less to actually run, would have to be running before all the other lower startup RAM requirement commands so you could get all the files you needed running to support the hardware and software you were using. Grant Darwin NT |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 6,285,104 RAC: 14,873 |
When it happens to me on my laptop, I set No New Tasks, suspend all unstarted tasks, then as each running task ends, more RAM becomes available. At the point there's enough for the problem task to start, I then unsuspend one task at a time, then find all my cores can run again. Sid and Grant, Thank you for your further thoughts on this. I find it puzzling, as yesterday I had 4 of these pre-helical-bundle tasks running at the same time on the laptop whereas the day before it could only manage 3. But anyway, next time it happens I will try Sid’s suggestion for unclogging the bottleneck - thanks for that. Mark |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised.It's not a case of how much is needed to start or to actually run (the tasks that can use up to 4GB of RAM only need several hundred MB or so when they first start)- it's just about how much the maximum they claim they will need is. I know that's the case. What I'm meaning by "not misconfigured" is that they're consciously been configured in the way we see, whether it's being used for an individua; task or makes sense to us at our end or not. On the 4-core laptop I'm using right now, 3 tasks are "pre_helical_bundles" which we know are set up for 6675Mb RAM (and 8583Mb disk space) but are likely to have been created before these adjustments were made and you're suggesting are "misconfigured". But their "Virtual memory size" while running are only 420Mb, 472Mb and 493Mb. The 4th core is running a relatively new task "f60030e2d399cf97bd574292ff707fcd_fae0a51cf659d300dc90ab2264960253_barrel6_L2L5L5L7L5L7L3_relax_SAVE_ALL_OUT_1393099_4" (should I call this "barrel6 ?) whose Virtual memory size is only 380Mb, but looking at my "client_state.xml" file, it's set up to ask for 8583Mb RAM and 1907 Disk space. You might call this misconfigured too, but given it's after the adjustments made, it would be deliberate, so who's to call it misconfigured? The individual task is way out of line, but if it's configured to cover the entire batch issued, which it is aiui, the only person who'd know is the researcher themselves. I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
The 4th core is running a relatively new task "f60030e2d399cf97bd574292ff707fcd_fae0a51cf659d300dc90ab2264960253_barrel6_L2L5L5L7L5L7L3_relax_SAVE_ALL_OUT_1393099_4" (should I call this "barrel6 ?) whose Virtual memory size is only 380Mb, but looking at my "client_state.xml" file, it's set up to ask for 8583Mb RAM and 1907 Disk space. You might call this misconfigured too, but given it's after the adjustments made, it would be deliberate, so who's to call it misconfigured?Me. The peak working set size for all of the Tasks i've done of that type so far is less than 500MB. Asking for 17 times more RAM than is necessary indicates it's not right- that makes it mis-configured. While there may be some Tasks in the batch that will need more RAM, I haven't seen any work in the past where the difference between the most & least RAM actually used has been double, let alone 17 times, the average amount. I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use. It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...). So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,916,897 RAC: 2,587 |
hi, new sorts of tasks downloaded on 2Gb units : 77701868c29166a607c77ce7756b607a_1763459782ee1b1b0b72a3468e89a34a_1kq1A_L4L5L9L8L5L4 and somes pictures of the 3 computer in use ... https://www.casimages.com/i/210515013640487009.png.html https://www.casimages.com/i/210515013640709918.png.html https://www.casimages.com/i/210515013908653872.png.html |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use. As long as these excessive requirement values are about, it looks like it's going to be a case of whether things line up or not- a system is owed debt to Rosetta, it's got no other Tasks already running & so is able to start a larger RAM requirement Task, then the next Tasks(s) it tries to run are smaller RAM requirement Tasks, so they start up OK. So we end up with plenty of work being done. But if things don't line up- some small RAM requirement Tasks are already running, so it can't start the large RAM requirement Task. If they have a cache (and worse yet it's a large cache) then the system may load up with more work for their other project(s). So we end up with much less Rosetta work being done, and it will be some time before their cached work for the other projects has been cleared & they can get & run more work for Rosetta- if the first Tasks they get aren't of the small RAM requirement Tasks in which case it will just continue to result in Tasks waiting for RAM (that they don't actually need). Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use. I take all your points. I'm on the user side of the fence too and I superficially agree that's how it appears. Trouble is, all the reasons and causes and needs are on the other side of the server divide, so as there's no way I can know anything about them, there's similarly no way I can tell if anything we're seeing is necessary or not. Nor am I going to tell people who are a million times more knowledgeable than me that they're setting everything up brainlessly wrong. Especially having already revisited their assumptions. Maybe I'm too squeamish. That 'In Progress' tasks have hit 449k (18.4% below the pre-April peak but 41.2% above the low) seems more a sign that 2Gb & 4Gb hosts are able to run more tasks because of settings made by the same guys who decided on the huge settings. That also makes me think they're doing only what they need and have reasons to do. And the other major factor is that, if they have a particular batch of work which makes particularly large resource demands because of the nature of the questions they want answered, they're not going to stop asking those questions just because 50% or more hosts don't have the capacity to assist in answering them. Because up to 50% will have the capacity and the bottom line is getting the answer to their question and nothing else, even if that means it takes a little longer to do. We've asked the question whether hosts with less resources can continue to contribute on some work and they've gone away and changed things so that they can, which is what kissagogo27 is telling us above - great news. Beyond that, we get into the sphere of the project only asking for the answer to questions that are no larger than they were before. It's apparent that's not the nature of this project. And some hosts will no longer be able to contribute here with only the same resources asked of them several years ago. And new hosts will arrive who do have those resources available. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,167,577 RAC: 4,048 |
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use. I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,214,422 RAC: 10,711 |
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use. Yeah, it's 8Gb hosts (who should be ok tbh depending on settings) 4Gb hosts (for whom it's on the cusp) and particularly 2Gb hosts who are almost completely excluded, but ought to have bits and pieces coming through now, that are the issue. When or if hosts upgrade it'll improve over time, but never quite get back to where it was, though all these tasks ought to be much more productive than they were before on the 60% - 81% of hosts haven't been affected throughout |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
And the other major factor is that, if they have a particular batch of work which makes particularly large resource demands because of the nature of the questions they want answered, they're not going to stop asking those questions just because 50% or more hosts don't have the capacity to assist in answering them. Because up to 50% will have the capacity and the bottom line is getting the answer to their question and nothing else, even if that means it takes a little longer to do.That's all well and good- but is absolutely insane to set such high minimum requirements for Tasks that don't come anywhere close to using the amounts they are requiring as it stops many systems from being able to process them, or results in cores/threads going unused by Rosetta that are available for it's use. As i mentioned before- we have had high RAM requirement Tasks on the project before- Tasks that required more than double the amount of RAM of any Task i have seen since this excessive configuration value issue started. And people were able to continue processing the existing Tasks at the time without issue as none of them had excessive minimum RAM or disk space requirements above & beyond what they actually required. Setting a limit that is double what is actually required, just in case, is one thing. But to have a requirement that is 17 times larger than the largest value ever used is beyond ridiculous, and results in them having less resources to process the work they want done. If they really want this work processed, then they should make use of the resources that are available & not block systems that are capable of processing it by having unrealistic & excessive configuration values. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,835,753 RAC: 22,968 |
I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.Since you make use of only half of your available cores/threads then it's not surprising that you're not having issues. If you were to use all of your cores & threads, then with so little RAM that system would be having issues just like all the others are. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,108,643 RAC: 5,969 |
I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.Since you make use of only half of your available cores/threads then it's not surprising that you're not having issues. If you were to use all of your cores & threads, then with so little RAM that system would be having issues just like all the others are. Surely the point is how many cores are used for Rosetta, not how many cores are in use overall. I run a 3700x and a 3900. They’re restricted to 5 & 6 Rosetta WUs respectively but also run 3/4 CPDN WUs and the rest of the cores are WCG or TN- Grid so all 16/24 cores are running constantly. All within 16gb / machine with zero problems. I wouldn’t consider filling either machine with just Rosetta with or without the current config problem because of the L3 cache requirements. |
Message boards :
Number crunching :
Rosetta needs 6675.72 MB RAM: is the restriction really needed?
©2024 University of Washington
https://www.bakerlab.org