Message boards : Number crunching : Waiting to Run
Previous · 1 · 2 · 3
Author | Message |
---|---|
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 13,431 |
Does it just switch to "Waiting to Run" when you move the mouse? With the current setting I believe it will switch back to the last checkpoint (probably 0%) when interrupted becuase when you move the mouse/press a key it will switch from 90% RAM available to 50% available, possibly causing the switch to "Waiting to Run" and back to 0% complete. Are you able to leave the work in memory (paged to disk) when the task is suspended? That way it won't have to drop back to the last checkpoint and can continue processing from where it was up to. i.e. change this to 1: <leave_apps_in_memory>0</leave_apps_in_memory> |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Actually, shouldn't these be the other way around: You might have missed my first post: <leave_apps_in_memory>1</leave_apps_in_memory> <ram_max_used_busy_pct>65.000000</ram_max_used_busy_pct> If with that it should still go into waiting to run, I'd try: <ram_max_used_busy_pct>70.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>70.000000</ram_max_used_idle_pct> And than set in your client_state.xml (exit BOINC first): <user_run_request>1</user_run_request> You have there a 2 probably right now. . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Actually, shouldn't these be the other way around: Ok, I've made the two suggested changes, changing leave_apps_in_memory to "1" and upping the busy memory to 65%. I'll report back on how it works. |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Actually, shouldn't these be the other way around: Sadly, same results. Runs for about a day to day 1/2 and then hangs. Here is my latest overide file: --------------------------------------------------------- <global_preferences> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>0.000000</idle_time_to_run> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>1</leave_apps_in_memory> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.100000</work_buf_min_days> <work_buf_additional_days>0.250000</work_buf_additional_days> <max_ncpus_pct>50.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>60.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>50.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>65.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>100.000000</cpu_usage_limit> <suspend_cpu_usage>0.000000</suspend_cpu_usage> </global_preferences> |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Have you tried this part? If with that it should still go into waiting to run, I'd try: Also post the log, when BOINC suspends the task. It also might be helpful to use <cpu_sched>, <cpu_sched_debug> and <mem_usage_debug> in cc_config, so we can better see in the log what's going on there. What is the size of the pagefile/partition (whatever that is called in Linux)? . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Have you tried this part? So far it's gone the entire weekend with no hangups. I'll keep monitoring and apply your suggesting is it hangs. |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Have you tried this part? Well as a final warp-up I am processing for about 4-5 days without a termination. At this point I can live with that. I want to thank everyone on this list for your suggestions and help. |
Kong Kandal den 1. Send message Joined: 28 Apr 06 Posts: 1 Credit: 9,024,376 RAC: 0 |
Have you tried this part? Hello I am experiencing the same problem and have not found any solution. I have tried all the tricks in this thread,- but nothing seems to help. Any advices will be appreciated. Thank you. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
For any advices you need to unhide your computers or post a link to the details page of the affected machine. Also post the contents of your global_prefs_override.xml or if that file is not available on your computer (in your BOINC data directory), than global_prefs.xml. . |
Xenus Send message Joined: 14 May 09 Posts: 2 Credit: 664,614 RAC: 32 |
I'm running BOINC on an Ubuntu 12 system and about 6-8 weeks ago it began to develop a problem (no new software/hardware changes). It will frequently get stuck with one job at the "Waiting to Run" state. If I manuall abort that work unit it will begin to run the next job normally. The pattern is inconsistant. Sometimes it will process 2-4 work units just fine, other times it will hang on 2-3 in a row. Any thoughts? Exactly the same problem in Ubuntu 12.04 and 12.10 with Boinc 7.0.27 and Rosetta tasks. Also get the next task stuck on "Waiting to Run" for no good reason. Aborting that task then gets the tasks "Ready to Start" running. |
Xenus Send message Joined: 14 May 09 Posts: 2 Credit: 664,614 RAC: 32 |
I'm running BOINC on an Ubuntu 12 system and about 6-8 weeks ago it began to develop a problem (no new software/hardware changes). It will frequently get stuck with one job at the "Waiting to Run" state. If I manuall abort that work unit it will begin to run the next job normally. The pattern is inconsistant. Sometimes it will process 2-4 work units just fine, other times it will hang on 2-3 in a row. Any thoughts? Looks like the max memory issue. Increasing the percentage of memory usable gets the process running again. Seems like the Rosetta jobs have large and/or different memory requirements. Ideally there should be log message to indicate job can't run without more memory or it should simply abort itself to allow another job to run. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Looks like the max memory issue. Increasing the percentage of memory usable gets the process running again. Seems like the Rosetta jobs have large and/or different memory requirements. Ideally there should be log message to indicate job can't run without more memory or it should simply abort itself to allow another job to run. You need to allow at least 500MB per Rosetta task, better 1GB since some tasks need that much. Check all the posts in this thread if the issue comes back, all the relevant setting has been posted above. . |
Message boards :
Number crunching :
Waiting to Run
©2024 University of Washington
https://www.bakerlab.org