Message boards : Number crunching : System requirements????
Previous · 1 · 2
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I don't have a problem with this, mybe i wasn't clear in what i asked. A number of people are having problems with getting tasks that need more memory then they have and so the tasks are failing. The hosts that got them before me had 256 or 512mb ram and they errored then they have been sent to me. Why are they getting them in the first place? Is the system failing to see the hosts memory? pete. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory. I have seen in the past where the "short list" of available work gets overpopulated with high memory tasks and so normal memory hosts aren't able to get work. It's frustraiting all around, because there are normal tasks out in the 20,000 available... but the BOINC server code doesn't work to keep any on the short list of tasks it keeps in shared memory. It only searches the first x tasks for one your machine can process, and if it doesn't find any then it gives up. That's why such problems are most common after an outage of the project server, because the server is getting hit with so many requests... and I think it will send low memory tasks to high memory hosts. Just searching down the list until it reaches it's search limit, or finds something you can crunch. So these high memory tasks tend to float to the top of the list because many hosts are running past them in the list looking for work. Once the search limit is filled with high memory tasks, noone can get normal memory work until a high memory host comes along and pulls some of those tasks off. This is a BOINC issue and has been discussed on the BOINC boards. But I don't know if there is a plan to enhance the BOINC server's scheduler to better handle multiple types of tasks. Rosetta Moderator: Mod.Sense |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory. seams a simple solution for this, give it a max memory. |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory. There's atleast 2 types of Rosetta-wu's, "low"-memory-wu's and "high"-memory-wu's. The "high"-memory-wu's seems to be marked at 763 MB or something now, meaning you'll need atleast 768 MB installed memory, and needs to increase BOINC's memory-preferences to 99% if you've only got 768 MB installed memory. The "low"-memory-wu's is incorrectly marked of needing 96 MB memory. This is too low, since in practice they can use around 120 MB or something of "real" memory, probably more if uses screensaver. Meaning, computers that has 256 MB memory and 50% memory-preference, will get assigned the wu since 96 MB < 127 MB. But, during crunching, Rosetta@home uses more than 127 MB, and the wu is correctly aborted by BOINC. Whatever "extra" is assigned in pagefile isn't a problem, the problem is the "low"-memory-wu's uses more than 96 MB "real" memory. Rosetta@home mis-configuring their "low"-memory-wu's is a Rosetta@home-problem, and has nothing to do with BOINC. Seasonal Attribution made the same mistake, setting memory-requirement to 256 MB, while in reality uses 430 MB, 480 MB if displays screensaver. Still, they did write the requirement was 1 GB on web-page... I have seen in the past where the "short list" of available work gets overpopulated with high memory tasks and so normal memory hosts aren't able to get work. It's frustraiting all around, because there are normal tasks out in the 20,000 available... but the BOINC server code doesn't work to keep any on the short list of tasks it keeps in shared memory. It only searches the first x tasks for one your machine can process, and if it doesn't find any then it gives up. That's why such problems are most common after an outage of the project server, because the server is getting hit with so many requests... and I think it will send low memory tasks to high memory hosts. Just searching down the list until it reaches it's search limit, or finds something you can crunch. So these high memory tasks tend to float to the top of the list because many hosts are running past them in the list looking for work. Once the search limit is filled with high memory tasks, noone can get normal memory work until a high memory host comes along and pulls some of those tasks off. Yes, it is a current weakness in BOINC that low-memory-tasks is assigned to computers with lots of memory, and the Feeder doesn't keep a portion of low-memory-tasks available if same application has wu's with different memory-requirements. Not sure if there's any current plans to change this, and even if there is, it can take many months before it's changed, if Rosetta@home doesn't program the changes themselves... But, the Feeder can be configured to keep a certain amount of Tasks available, as long as the Tasks is for different applications... So, a work-around would be to use the same actual Rosetta-application, but duplicate it and call one application "low-memory" and another "high-memory", and as long as there is any "low-memory" wu's generated, the Feeder will have some available. Now, this won't fix the other problem, that computers with lots of memory still grabs low-memory-wu's. To fix this problem, some changes must be made. Still, would guess a small customization of Rosetta@home-scheduler to do something like this would work: "if BOINC usable memory >= 1 GB, set computer to 'only crunch high-memory-application' except if none available". "if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available". "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Rosetta@home mis-configuring their "low"-memory-wu's is a Rosetta@home-problem, and has nothing to do with BOINC. Where do you see the memory requirement assigned to a task? Is that in one of the xml files? I've been reviewing TCP traces of the client interactions with the project, and I don't recall seeing that in the data. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
"if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available". here is also something wrong! here you say that systems with few memory can crunch tasks that requier large amounts of memory, that aint gonna work. i would suggest make the scedular see 3 wu's, and 3 types of machines. (multiple core issues has to be helped out first, so boinc sees memory/core and not total memory.) machines with less then 512 mb of memory installed machines with 512 to 1024 mb of memory installed and machines with 1024 or more mb's of memory installed and then make it like this : give all machines wu's that fit within above rules. if there are to much WU's with "less then 512" specification, then allow "512 to 1024mb" machines to crunch on those wu's if there are not enaugh 1024 or more wu's let the "1024mb or more" pc's crunch "512 to 1024mb" tasks. in this way i think the tasks getter devided better, and we can make optimal use of the resources we have. [EDIT] numbers metioned above can be changed, to optimize the spread of WU's. |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Where do you see the memory requirement assigned to a task? Is that in one of the xml files? I've been reviewing TCP traces of the client interactions with the project, and I don't recall seeing that in the data. It's like other things listed in client_state.xml, marked <rsc_memory_bound> If you haven't connected to Scheduling-server after got assigned Task(s), it's also listed in sched_reply_project-url.xml "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
"if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available". Not sure you're ever run World Community Grid, here users can choose to run only one or a few of the available applications, and in case limited to not all applications can also make a choise of "if no work available for selected application(s), send any other work". But, even if user has set to only accept one type of task, BOINC still does the normal checks if computer has enough memory, free disk space and so on to handle the task. Meaning, if there aren't any "low-memory" tasks available, a computer with less than 763 MB usable BOINC-memory still won't get the "high-memory" tasks. The only mistake did make is, if not mis-remembers the BOINC-defaults is to use max 90% memory then idle, so setting memory-limit to 1 GB will disallow a large portion of usable computers. Setting the limit to 900 MB or something would be better. i would suggest make the scedular see 3 wu's, and 3 types of machines. (multiple core issues has to be helped out first, so boinc sees memory/core and not total memory.) As long as Rosetta@home AFAIK only has 95.37 MB and 763 MB-tasks, having more split-up wouldn't change anything. Still, 512 MB installed memory is a fairly popular choise, going by The Computational and Storage Potential of Volunteer Computing of active SETI@home-computers February 2006, looking on figure 10 roughly 86.5% had atleast 512 MB, while roughly 50.8% had atleast 1 GB, and rougly 12.2% atleast 2 GB. Note, this is cpu-power, not #computers. Going by my own, very unofficial, and potentially very wrong data from February 2007, similar SETI@home-data for active computers is: 4.65% more than 2 GB 24.69% atleast 2 GB 29.28% more than 1 GB 62.08% atleast 1 GB 68.15% more than 512 MB 88.08% atleast 512 MB So, my guess is, in February 2008, atleast 30% of cpu-power will have atleast 2 GB memory, and maybe 75% atleast 1 GB. As long as Rosetta@home has the "low"-memory-wu's that uses 130 MB or something, sneaking-in under 512 MB will be less and less important as users upgrades to faster computers with more and more memory. How good these data fits Rosetta@home is another matter, but Rosetta@home can take a look on their database. Also, it is possible to wade-through the stats-dumps to gather info... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
therefore we should ask astro, if he can gather some date like that with his program's. and find out wat is really needed, etc. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
It's like other things listed in client_state.xml, marked <rsc_memory_bound> Thanks. Now I wonder was "rsc" stands for? I found this in the BOINC wiki Memory Management. I've got 3 machines, all have 1GB or more and 2 CPUs. So all are capable of running the "high memory" tasks, which used to be limited to something just shy of 512MB. Appears now that perhaps there are higher memory tasks then that (as based on the msgs people are reporting). Reviewing task manager in Windows, I see one active Rosetta task using over 180MB for "Mem Usage" on each of these machines. I reviewed the client state files on all of these systems and the rsc_memory_bound for all WUs is 100000000, which I presume is in bytes, and that is how you get to the 96MB number you mentioned. Reading the above wiki link, it is unclear if these changes have actually been implemented. But it says that if the task exceeds the bound at any time, it will be aborted. It also says that on the server side, as the work unit is created, this is "an estimate", yet the client side sees it as a hard limit. So it seems to contradict itself. In any case, my 3 Windows machines are all running these tasks and all have exceeded the 96K bound. I'm using BOINC 5.10.20. So, I take it that the enforcement and methods outlined in the wiki are not implemented at that version. Does anyone know specifics about if this has been implemented and if so, in which BOINC version? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Thanks. Now I wonder was "rsc" stands for? Hmm, no direct idea, but would guess on something like ReSourCe or something... I found this in the BOINC wiki Yes, it's bytes, due to the (wrong) usage of "Mega", this becomes 95.37 MB. A quick look, I've got one running task, that is marked as 95.37 MB, but according to Task Manager has peak 203.8 MB, this is more than double that is specified in wu... Reading the above wiki link, it is unclear if these changes have actually been implemented. But it says that if the task exceeds the bound at any time, it will be aborted. It also says that on the server side, as the work unit is created, this is "an estimate", yet the client side sees it as a hard limit. So it seems to contradict itself. Each wu has 5 different limits: <rsc_fpops_est> <rsc_fpops_bound> <rsc_memory_bound> <rsc_disk_bound> <delay_bound> <delay_bound> is used by Scheduling-server then a Task is assigned, there report_deadline = now + delay_bound Used by client, to try to return all Tasks by their deadline. Newer clients is better than older clients. <rsc_fpops_est> together with <duration_correction_factor> and some other parameters and <delay_bound> is used by Scheduling-server to see if a Task can be sent or not. Note, for some reason it was decided a client that has no task in a project can still get 1 task, even if can't meet <delay_bound>. On client, used to estimate remaining cpu-time. <rsc_disk_bound> is enforced by Scheduling-server, if a computer hasn't enough BOINC-usable disk-space, it will never be assigned. Client, if the result-file(s) exceeds <rsc_disk_bound>, they get aborted. Not sure if it also includes any other temporary files in it's "Slots"-directory or not... <rsc_fpops_bound> is client-side only. Since client doesn't really know how many flops a computer has used on a Task, in reality <rsc_fpops_bound> is a max cpu-time-limit: if current_cpu_time > rsc_fpops_bound / p_fpops => abort_task This means that anyone that runs an "optimized" BOINC-client, or other method to artificially increase p_fpops, has a bigger chance of hitting this limit... <rsc_memory_bound> is the problematic one... Scheduling-server has for a very long time also enforced this limit, but with v5.8.xx this has been slightly changed: if pre-v5.8.xx-client: if rsc_memory_bound > m_nbytes => don't assign task if v5.8.xx or later clients: if rsc_memory_bound > m_nbytes * max of (ram_max_used_busy_pct or ram_max_used_idle_pct) / 100 => don't assign task. Client-side on the other hand, <rsc_memory_bound> has never been enforced, and not even in v6.1.x is this being used. But, v5.8.xx-clients and later does enforce the 2 ram_max_used_busy_pct and ram_max_used_idle_pct. Meaning, if a Task uses more memory than max of these 2 limits, it will be aborted. Since memory-limit is taken care of by the 2 ram-usage-parameters, it's likely decided it's not important that <rsc_memory_bound> is exceeded or not. Also, BOINC support applications to variate memory-usage depending on available memory and memory-preferences, meaning <rsc_memory_bound> is set low so all computers can get the task, and on low-memory-computers this limit isn't exceeded. On computers with lots of memory on the other hand, maybe 3x more memory than <rsc_memory_bound> is being used. Due to this, would guess it's unlikely <rsc_memory_bound> will ever be enforced by client. While client-side enforcing of <rsc_fpops_bound> and <rsc_disk_bound> has been included since pre-v3.xx, and Scheduling-server has enforced <rsc_memory_bound> and <rsc_disk_bound> has also been included since 2004, the big weakness is that #cpu's is not taken into consideration on either client or server. The only very limited support is that after a Task has already started it can be paused if memory-usage gets too high, but better than to try to start multiple "high-memory" tasks would be to run 1 "high-memory" and 1 "low-memory" instead.... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
so if i get this right, they have to rewrite the largest part of the Boinc program to work with multiple cpu's when checking to memory etc... |
Message boards :
Number crunching :
System requirements????
©2024 University of Washington
https://www.bakerlab.org