Message boards : Number crunching : linux script - stopatcheckpoint
Author | Message |
---|---|
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
hi, This is a bash script to stop BOINC soon after the running task checkpoints. One reason for writing it was to demonstrate the usefulness of the boinc_cmd utility that comes with the linux implementation of BOINC. The other motivation is described under Background, below. Background You will know that in contrast to many other BOINC projects, Rosetta checkpoints its results rarely. The reason this is a nuisance is that when you stop BOINC for any reason, work done since the last checkpoint is lost. BOINC and Rosetta between them recover from the situation when BOINC is re-started, but the lost work needs to be re-calulated. Rosetta and CPDN are the two projects I run where this is an issue. On CPDN it is a particular issue as the project encourage users to back up their work every week or so, and to do that involves stopping BOINC. On both Rosetta and CPDN, it may be wise to stop BOINC before doing computationally excessive work as this might cause theBOINC application. You may want to shut down the machine to go on holiday, install a new DVD drive, etc etc. This script waits for the next checkpoint and exits. One use would be to reboot or shutdown after the next checkpoint: $ ~boinc/stopatcheckpoint; reboot $ ~boinc/stopatcheckpoint; halt Another use would be to run some other work, then restart BOINC, as in the following command line ~boinc/stopatcheckpoint; /pi/timpi 22 22; /etc/init.d/boinc start (note that I've got a Debian-style start command for BOINC here, other systems can work out your own version of this) The script Copy and past the following into file stopatcheckpoint in the BOINC directory. The script needs to be run from there to make the boinc_cmd utility pick up its passwords. If you need to run it from elsewhere, either copy the password file across, or build the passwords into the script. After pasting the script into your file system, you need to make it runnable, ie some variation of chmod 755 stopatcheckpoint #!/bin/sh # # stopatcheckpoint # # Author River 2007 # # Copyright but may be distributed under the GPL # - see http://www.gnu.org/licenses/gpl.txt # # River asserts his moral right to be identified as the Author echo "wait for BOINC checkpoint..." cd ~boinc prv=`./boinc_cmd --get_results|grep -v 0.0000|grep checkpoint` while (./boinc_cmd --get_results|grep "$prv">>/dev/null) ; do clear ./boinc_cmd --get_results|grep -v 0.0000|grep "time|----" echo "waiting for change to:$prv ..." sleep 3 done echo echo "*** new checkpoint ***" echo ./boinc_cmd --get_results|grep -v 0.0000|grep "time|----" echo echo "stopping BOINC..." echo ./boinc_cmd --quit Sample output While waiting the screen displays some info on every WU on the machine: 1) ----------- checkpoint CPU time: 25081.809985 current CPU time: 26803.275281 estimated CPU time remaining: 57799.145096 2) ----------- estimated CPU time remaining: 78446.605958 waiting for change to: checkpoint CPU time: 25081.809985 ... here, WU 1 is running and WU 2 is waiting to run. If a WU is finished then the final CPU time is shown. If there is more than one checkpoint time in the list, there may be problems, as described below. The current CPU time and time remaining should be changing every few seconds. After the checkpoint is reached the screen is left with a display like: 1) ----------- checkpoint CPU time: 25081.809985 current CPU time: 26878.500846 estimated CPU time remaining: 57852.131426 2) ----------- estimated CPU time remaining: 78446.605958 waiting for change to: checkpoint CPU time: 25081.809985 ... *** new checkpoint *** 1) ----------- checkpoint CPU time: 26879.487696 current CPU time: 26881.467395 estimated CPU time remaining: 55478.468342 2) ----------- estimated CPU time remaining: 78446.605958 stopping BOINC... Where this script does and doesn't work This script works well on a single cpu machine with only one project, and when there is onlu one task that has started. There can be any number of completed tasks waiting for upload, and any number of unstarted tasks waiting to run. This script gets confused if there is more than one running / started task, for example: - on a multi-cpu box; and - if BOINC starts one task without finishing another, as is normal on a multi-project box, and as may happen on a single project box since BOINC v 5.8 introduced memory management. Hope this is useful. The script is in Bash (Unix command line) so will not port to windows without more effort than I am going to put in. River~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
An additional limitation to this script is that it does not wait for a task that has never checkpointed at all, that is it only waits for tasks that have passed their first checkpoint when the script is called. R~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
...The script is in Bash (Unix command line) so will not port to windows without more effort than I am going to put in... but there is always someone willing to take an idea further. Cristophe has posted an .exe file for windows users that does roughly the same thing. Nice one C! See this post R~~ |
Kenneth Larsen Send message Joined: 17 Sep 05 Posts: 3 Credit: 112,217 RAC: 0 |
|
Message boards :
Number crunching :
linux script - stopatcheckpoint
©2024 University of Washington
https://www.bakerlab.org