Message boards : Number crunching : Waiting to Run
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
I put this computer in a new group and told it to use only one CPU and 50% of memory. I'll report back to see if this does anythng. Sorry no luck. I'm still getting jobs hung up with "Waiting to Run" which prevents any other job from loading. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Sorry no luck. I'm still getting jobs hung up with "Waiting to Run" which prevents any other job from loading. How many jobs is BOINC starting? How many are "waiting to run"? With 50% of processors on a dual core machine you should have nothing else than one single WU running. Please post a your log (when BOINC is suspending WUs), so we can see what is going on. . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Sorry no luck. I'm still getting jobs hung up with "Waiting to Run" which prevents any other job from loading. How do I get the log out of the client? |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
In BOINC data directory (i.e. there, where the client_state.xml ist, don't ask me where it is on a Linux system) there should be a file called stdoutdae.txt. That's the log. That's btw one of the reasons, why I recommended to configure access from other machine with BOINC Manager on it, that makes things a lot easier. . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
In BOINC data directory (i.e. there, where the client_state.xml ist, don't ask me where it is on a Linux system) there should be a file called stdoutdae.txt. That's the log. I found the file, but it's rather large and this message board doesn't appear to thave the ability to add attachments. Should I look for something specific and cut/paste it to a message? |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
In BOINC data directory (i.e. there, where the client_state.xml ist, don't ask me where it is on a Linux system) there should be a file called stdoutdae.txt. That's the log. I think I found a relevant chunk. Take a look: --------------------------------------------------------- Initialization completed 19-Nov-2012 07:52:07 [---] Running CPU benchmarks 19-Nov-2012 07:52:07 [---] Suspending computation - CPU benchmarks in progress 19-Nov-2012 07:52:07 [---] Running CPU benchmarks 19-Nov-2012 07:52:07 [---] Running CPU benchmarks 19-Nov-2012 07:52:38 [---] Benchmark results: 19-Nov-2012 07:52:38 [---] Number of CPUs: 2 19-Nov-2012 07:52:38 [---] 1099 floating point MIPS (Whetstone) per CPU 19-Nov-2012 07:52:38 [---] 1747 integer MIPS (Dhrystone) per CPU 19-Nov-2012 07:52:39 [rosetta@home] Restarting task Ccyst5_d4_0001_abinitio_SAVE_ALL_OUT_64214_2936_0 using minirosetta version 345 in slot 1 19-Nov-2012 07:53:58 [---] Suspending computation - CPU is busy 19-Nov-2012 07:54:08 [---] Resuming computation 19-Nov-2012 07:54:59 [---] Suspending computation - CPU is busy 19-Nov-2012 07:55:19 [---] Resuming computation 19-Nov-2012 07:55:29 [---] Suspending computation - CPU is busy 19-Nov-2012 07:55:40 [---] Resuming computation 19-Nov-2012 07:59:11 [---] Suspending computation - CPU is busy 19-Nov-2012 07:59:21 [---] Resuming computation 19-Nov-2012 08:00:02 [---] Suspending computation - CPU is busy 19-Nov-2012 08:00:13 [---] Resuming computation 19-Nov-2012 08:00:43 [---] Suspending computation - CPU is busy 19-Nov-2012 08:00:53 [---] Resuming computation 19-Nov-2012 08:03:15 [---] Suspending computation - CPU is busy 19-Nov-2012 08:03:25 [---] Resuming computation 19-Nov-2012 08:03:37 [---] Received signal 15 19-Nov-2012 08:03:37 [---] Exit requested by user 19-Nov-2012 08:04:33 [---] Starting BOINC client version 7.0.27 for i686-pc-linux-gnu 19-Nov-2012 08:04:33 [---] log flags: file_xfer, sched_ops, task 19-Nov-2012 08:04:33 [---] Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 19-Nov-2012 08:04:33 [---] Data directory: /var/lib/boinc-client 19-Nov-2012 08:04:33 [---] Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.00GHz [Family 15 Model 4 Stepping 9] 19-Nov-2012 08:04:33 [---] Processor: 1.00 MB cache 19-Nov-2012 08:04:33 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr lahf_lm 19-Nov-2012 08:04:33 [---] OS: Linux: 3.2.0-33-generic 19-Nov-2012 08:04:33 [---] Memory: 992.24 MB physical, 1.57 GB virtual 19-Nov-2012 08:04:33 [---] Disk: 35.14 GB total, 27.14 GB free 19-Nov-2012 08:04:33 [---] Local time is UTC -5 hours 19-Nov-2012 08:04:33 [---] No usable GPUs found 19-Nov-2012 08:04:33 [---] Config: GUI RPC allowed from: 19-Nov-2012 08:04:33 [---] A new version of BOINC is available. <a href=http://boinc.berkeley.edu/download.php>Download it.</a> 19-Nov-2012 08:04:33 [rosetta@home] URL https://boinc.bakerlab.org/rosetta/; Computer ID 1340220; resource share 100 19-Nov-2012 08:04:33 [rosetta@home] General prefs: from rosetta@home (last modified 09-Nov-2012 12:50:10) 19-Nov-2012 08:04:33 [rosetta@home] Computer location: school 19-Nov-2012 08:04:33 [---] General prefs: using separate prefs for school 19-Nov-2012 08:04:33 [---] Reading preferences override file 19-Nov-2012 08:04:33 [---] Preferences: 19-Nov-2012 08:04:33 [---] max memory usage when active: 496.12MB 19-Nov-2012 08:04:33 [---] max memory usage when idle: 893.02MB 19-Nov-2012 08:04:33 [---] max disk usage: 17.57GB 19-Nov-2012 08:04:33 [---] don't use GPU while active 19-Nov-2012 08:04:33 [---] suspend work if non-BOINC CPU load exceeds 25 % 19-Nov-2012 08:04:33 [---] (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 19-Nov-2012 08:04:33 [---] Not using a proxy Initialization completed 19-Nov-2012 08:04:45 [rosetta@home] Restarting task Ccyst5_d4_0001_abinitio_SAVE_ALL_OUT_64214_2936_0 using minirosetta version 345 in slot 1 19-Nov-2012 08:49:04 [rosetta@home] Sending scheduler request: To fetch work. 19-Nov-2012 08:49:04 [rosetta@home] Reporting 3 completed tasks, requesting new tasks for CPU 19-Nov-2012 08:49:06 [rosetta@home] Scheduler request completed: got 4 new tasks 19-Nov-2012 08:49:08 [rosetta@home] Started download of flags_rb_11_18_34278_65937__t000__0_C1_robetta 19-Nov-2012 08:49:08 [rosetta@home] Started download of input_rb_11_18_34278_65937__t000__0_C1_robetta.zip 19-Nov-2012 08:49:10 [rosetta@home] Finished download of flags_rb_11_18_34278_65937__t000__0_C1_robetta 19-Nov-2012 08:49:21 [rosetta@home] Finished download of input_rb_11_18_34278_65937__t000__0_C1_robetta.zip 19-Nov-2012 10:36:40 [rosetta@home] Computation for task Ccyst5_d4_0001_abinitio_SAVE_ALL_OUT_64214_2936_0 finished 19-Nov-2012 10:36:40 [rosetta@home] Starting task Ploop4_3_1_1_1_abinitio_design_relax_y039_005_63433_50_2 using minirosetta version 345 in slot 0 19-Nov-2012 10:36:40 [rosetta@home] Starting task rb_11_18_34278_65937__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_64586_1666_0 using minirosetta version 345 in slot 1 19-Nov-2012 10:36:42 [rosetta@home] Started upload of Ccyst5_d4_0001_abinitio_SAVE_ALL_OUT_64214_2936_0_0 19-Nov-2012 10:36:46 [rosetta@home] Finished upload of Ccyst5_d4_0001_abinitio_SAVE_ALL_OUT_64214_2936_0_0 19-Nov-2012 13:34:47 [rosetta@home] Computation for task Ploop4_3_1_1_1_abinitio_design_relax_y039_005_63433_50_2 finished 19-Nov-2012 13:34:47 [rosetta@home] Restarting task rb_11_18_34278_65937__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_64586_1666_0 using minirosetta version 345 in slot 1 19-Nov-2012 13:34:50 [rosetta@home] Started upload of Ploop4_3_1_1_1_abinitio_design_relax_y039_005_63433_50_2_0 19-Nov-2012 13:34:56 [rosetta@home] Finished upload of Ploop4_3_1_1_1_abinitio_design_relax_y039_005_63433_50_2_0 19-Nov-2012 15:58:06 [---] Suspending computation - CPU is busy 19-Nov-2012 15:58:26 [---] Resuming computation 19-Nov-2012 16:06:19 [---] Suspending computation - CPU is busy 19-Nov-2012 16:06:29 [---] Resuming computation 19-Nov-2012 16:12:21 [---] Suspending computation - CPU is busy 19-Nov-2012 16:12:31 [---] Resuming computation 19-Nov-2012 16:12:41 [---] Suspending computation - CPU is busy 19-Nov-2012 16:12:51 [---] Resuming computation |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,677,840 RAC: 9,700 |
So you can either change the 25% preference to 0% in BOINC: Tools > Computing Preferences > Processor Usage > While processor usage is less than 0 percent Or choose "Run always" from the Activity menu. HTH Danny |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...or you could set the value to a higher % of busy CPU before asking BOINC to suspend things. If this machine gets sluggish when you run a task that consumes 70% of CPU or something, then set the threshold at 65% so BOINC will more completely yield it's use of the machine when that task is running. Rosetta Moderator: Mod.Sense |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
So you can either change the 25% preference to 0% in BOINC: This is the LINUX version, there is menu to select local preferences. Do you know how to set this in the LINUX environment? |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
...or you could set the value to a higher % of busy CPU before asking BOINC to suspend things. If this machine gets sluggish when you run a task that consumes 70% of CPU or something, then set the threshold at 65% so BOINC will more completely yield it's use of the machine when that task is running. How/Where do I set this? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There are some .xml files on the client machine. The tag we're talking about is called <suspend_cpu_usage>, ex: to suspend when CPU reaches 77% busy: <suspend_cpu_usage>77.000000</suspend_cpu_usage> I believe the one you want to change is in a file called global_prefs_override.xml Make a backup of the file, then just use your editor of choice to modify that line. Rosetta Moderator: Mod.Sense |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
There are some .xml files on the client machine. The tag we're talking about is called <suspend_cpu_usage>, ex: to suspend when CPU reaches 77% busy: There was no line in that file so I added the following: <suspend_cpu_usage>90.000000</suspend_cpu_usage> Will report back on the results. |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
There are some .xml files on the client machine. The tag we're talking about is called <suspend_cpu_usage>, ex: to suspend when CPU reaches 77% busy: No luck, still the same problem. I did notice the pattern though. I will "abort" the stuck job and then two will begin to run (two CPU's). The both will run for a while and then one of them will quickly flash "waiting for memory" and then quickly jump to "waiting to run". Once that happens that job stays forever "stuck" and prevents other jobs from starting once the first one has completed. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
No luck, still the same problem. I did notice the pattern though. I will "abort" the stuck job and then two will begin to run (two CPU's). The both will run for a while and then one of them will quickly flash "waiting for memory" and then quickly jump to "waiting to run". Once that happens that job stays forever "stuck" and prevents other jobs from starting once the first one has completed. That shows you are still running two WUs at once. In your global_prefs_override.xml you should have a line like this: <max_ncpus_pct>100.000000</max_ncpus_pct> Make <max_ncpus_pct>50.000000</max_ncpus_pct> out of it. Since you are already leaving one virtual core free, you can set <suspend_cpu_usage>0.000000</suspend_cpu_usage> like this. Eventually you can post the entire global_prefs_override.xml, maybe there's more to optimize. . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
No luck, still the same problem. I did notice the pattern though. I will "abort" the stuck job and then two will begin to run (two CPU's). The both will run for a while and then one of them will quickly flash "waiting for memory" and then quickly jump to "waiting to run". Once that happens that job stays forever "stuck" and prevents other jobs from starting once the first one has completed. Here is the new updated file: ----------------------------------------- <global_preferences> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>0.000000</idle_time_to_run> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>0</leave_apps_in_memory> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.100000</work_buf_min_days> <work_buf_additional_days>0.250000</work_buf_additional_days> <max_ncpus_pct>100.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>60.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>50.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>50.000000</cpu_usage_limit> <suspend_cpu_usage>0.000000</suspend_cpu_usage> </global_preferences> |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,677,840 RAC: 9,700 |
I *think* <disk_min_free_gb>0.000000</disk_min_free_gb> needs to be higher than 0, otherwise it's ignored. Also, again, not sure, but I think <leave_apps_in_memory>0</leave_apps_in_memory> would be better as 1 unless you can't use that setting for some RAM or paging-related reason. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Try this: <leave_apps_in_memory>1</leave_apps_in_memory> <ram_max_used_busy_pct>65.000000</ram_max_used_busy_pct> . |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,677,840 RAC: 9,700 |
Actually, shouldn't these be the other way around: <max_ncpus_pct>100.000000</max_ncpus_pct> <cpu_usage_limit>50.000000</cpu_usage_limit> I think max_ncpus_pct is the number of processors (so 50% to use one physical processor) and cpu_usage_limit is the proportion of run-time to pause-time while running. I'd recommend swap those values and get BOINC to re-read the file. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Actually, shouldn't these be the other way around: Yep. Haven't seen that. . |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
Actually, shouldn't these be the other way around: Well I was a bit hopeful. With the following changes (see my current settings below) it ran for about a 1 1/2 days and then one job hit "waiting to run". ------------------------------- <global_preferences> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>0.000000</idle_time_to_run> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>0</leave_apps_in_memory> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.100000</work_buf_min_days> <work_buf_additional_days>0.250000</work_buf_additional_days> <max_ncpus_pct>50.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>60.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>50.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>100.000000</cpu_usage_limit> <suspend_cpu_usage>0.000000</suspend_cpu_usage> </global_preferences> |
Message boards :
Number crunching :
Waiting to Run
©2024 University of Washington
https://www.bakerlab.org