Message boards : Rosetta@home Science : 7 WUs returned in error by computer 131283
Author | Message |
---|---|
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
These 7 WUs exited with error status 2 yesterday While looking at my results to get further knowledege about that error, I found that they all was unsent to to everyone else !. The error in subject computer was caused by I inadvertently booting TWO OSsses at same time that had written simultaneus in that same hard disk space, without a CPU interlock mechanism, for each independent OS write -:( So, Please re-issue these 7 WUs again -:) No error on any of them ... only hard disk corruption Thanks Click signature for global team stats |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
These 7 WUs exited with error status 2 yesterday Thanks for the thought - but this will happen automatically anyway. When a WU is returned with an error, or if it is not returned at all before the deadline, the server automatically sends it out again (up to a maximum of three tries, at present). This means that work would only be skipped if three different people had spurious error reports. Each project selects how many times BOINC should retry before giving up. Three tries was chosen as a sensible balance between trying too many times when there really is something wrong with the WU, and not trying enough when it is a non-WU error like this one. Actually I am not sure if it is three tries in total, or thre re-tries making four in total, but you get the idea either way. We will all be able to see which it is by folowing this wu over the next few days, because it has already been errored three times now. With thousands of participants, mistakes like this are bound to happen most days to *somebody* on the project, and just by bad luck it was your turn that day. And it is by good design to expect such errors and work round them automatically. If you keep an eye on those WU, you will see them reissued to someone else in due course - or you can just forget about them and trust the system to do its thing. River~~ |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
|
Bok Send message Joined: 17 Sep 05 Posts: 54 Credit: 3,514,973 RAC: 0 |
Carlos, the difference in versions is, I believe, just a fix to a windows only bug. There was no need to release this version for linux. Most of my machines run Rosetta under linux, a mixture of Gentoo and Redhat with zero problems.. Bok Free-DC Stats for all projects Custom Stats |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
|
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Carlos, a quick note: normally you shouldn't need to use kill -9, a simple kill (implies "15", i.e. SIGTERM) instead should be fine 99.9% of the time (unless the app itself blocks it). Normal SIGTERM (no argument, or -15) kill the equivalent between shutting down Windows gracefully via Start->Shutdown (funny, like the BIOS error "<Beep!> Keyboard not found. Press any key to continue"). It lets the app write its stuff to disk etc. SIGKILL (-9) is like pulling the power plug. PS: I also had several Rosetta 4.80 WUs freeze on Linux 2.4.27 (Debian Sarge), which I had to kill (and BOINC restarted them). I've never had a WU freeze under WinXP sofar. I've never had a Rosetta 4.21 (WCG/HPF) freeze sofar (Linux or Win). Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
I am getting this error on some WUs under linux 2.4.x and rosetta freezes on ram until I pkill -9 boinc and then restart it manually *A Linux only bug ??? crobertp [/home/boinc/BOINC] > ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately & [1] 16353 crobertp [/home/boinc/BOINC] > ssh think@matrix.cp3 ssh: connect to host matrix.cp3 port 22: Connection timed out crobertp [/home/boinc/BOINC] > w 11:48am up 14 days, 36 min, 1 user, load average: 0.67, 0.47, 0.22 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT boinc pts/3 200.149.245.172 10:45am 0.00s 2:33 0.01s w crobertp [/home/boinc/BOINC] > *** glibc detected *** corrupted double-linked list: 0x093eb028 *** crobertp [/home/boinc/BOINC] > Click signature for global team stats |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
crobertp [/home/boinc/BOINC] > ps xu USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 14309 0.0 1.3 5444 3308 ? S 04:48 0:04 ./boinc -redirectio -allow_remote_gui_rpc -return_results_imme boinc 16088 0.0 0.8 7200 2144 ? S 10:45 0:00 /usr/sbin/sshd boinc 16089 0.0 0.8 3484 2232 pts/3 S 10:45 0:00 -bash boinc 16137 24.1 23.1 109600 57416 ? SN 10:46 13:18 rosetta_4.80_i686-pc-linux-gnu cc 1fna _ -abrelax -stringent_r boinc 16138 0.0 23.1 109600 57416 ? SN 10:46 0:00 rosetta_4.80_i686-pc-linux-gnu cc 1fna _ -abrelax -stringent_r boinc 16139 0.0 23.1 109600 57416 ? SN 10:46 0:00 rosetta_4.80_i686-pc-linux-gnu cc 1fna _ -abrelax -stringent_r boinc 16345 0.0 0.2 2544 664 pts/3 R 11:41 0:00 ps xu crobertp [/home/boinc/BOINC] > pkill boinc crobertp [/home/boinc/BOINC] > ps xu USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 16088 0.0 0.8 7200 2144 ? S 10:45 0:00 /usr/sbin/sshd boinc 16089 0.0 0.8 3484 2232 pts/3 S 10:45 0:00 -bash boinc 16350 0.0 0.2 2532 652 pts/3 R 11:42 0:00 ps xu crobertp [/home/boinc/BOINC] > ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately & [1] 16353 crobertp [/home/boinc/BOINC] > ssh think@matrix.cp3 ssh: connect to host matrix.cp3 port 22: Connection timed out crobertp [/home/boinc/BOINC] > w 11:48am up 14 days, 36 min, 1 user, load average: 0.67, 0.47, 0.22 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT boinc pts/3 200.149.245.172 10:45am 0.00s 2:33 0.01s w crobertp [/home/boinc/BOINC] > *** glibc detected *** corrupted double-linked list: 0x093eb028 *** crobertp [/home/boinc/BOINC] > pkill boinc crobertp [/home/boinc/BOINC] > ps xu USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 16088 0.0 0.9 7216 2276 ? S 10:45 0:00 /usr/sbin/sshd boinc 16089 0.0 0.7 3484 1952 pts/3 S 10:45 0:00 -bash boinc 16549 0.0 0.2 2532 652 pts/3 R 12:39 0:00 ps xu [1]+ Done ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately crobertp [/home/boinc/BOINC] > ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately & [1] 16551 crobertp [/home/boinc/BOINC] > ps xu USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 16088 0.0 0.8 7220 2196 ? S 10:45 0:00 /usr/sbin/sshd boinc 16089 0.0 0.7 3484 1856 pts/3 S 10:45 0:00 -bash boinc 16551 0.0 1.0 4972 2592 pts/3 S 12:39 0:00 ./boinc -redirectio -allow_remote_gui_rpc -return_results_imme boinc 16553 99.7 25.1 156424 62348 pts/3 RN 12:39 8:36 rosetta_4.80_i686-pc-linux-gnu cc 1cc8 A -abrelax -stringent_r boinc 16554 0.0 25.1 156424 62348 pts/3 SN 12:39 0:00 rosetta_4.80_i686-pc-linux-gnu cc 1cc8 A -abrelax -stringent_r boinc 16555 0.0 25.1 156424 62348 pts/3 SN 12:39 0:00 rosetta_4.80_i686-pc-linux-gnu cc 1cc8 A -abrelax -stringent_r boinc 16602 0.0 0.2 2544 664 pts/3 R 12:48 0:00 ps xu crobertp [/home/boinc/BOINC] > Click signature for global team stats |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
crobertp [/home/boinc/BOINC] > lsof -u boinc COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME sshd 16088 boinc cwd DIR 3,9 4096 2 / sshd 16088 boinc rtd DIR 3,9 4096 2 / sshd 16088 boinc txt REG 3,9 306504 456268 /usr/sbin/sshd sshd 16088 boinc mem REG 3,9 99663 232524 /lib/ld-2.3.2.so sshd 16088 boinc mem REG 3,9 31592 228843 /lib/libnss_nis-2.3.2.so sshd 16088 boinc mem REG 3,9 194800 277040 /usr/lib/libopensc.so.0.0.6 sshd 16088 boinc mem REG 3,9 33748 228098 /lib/libpam.so.0.75 sshd 16088 boinc mem REG 3,9 9064 228276 /lib/libdl-2.3.2.so sshd 16088 boinc mem REG 3,9 56684 228848 /lib/libresolv-2.3.2.so sshd 16088 boinc mem REG 3,9 7700 228070 /lib/libutil-2.3.2.so sshd 16088 boinc mem REG 3,9 51356 276901 /usr/lib/libz.so.1.1.4 sshd 16088 boinc mem REG 3,9 69132 228372 /lib/libnsl-2.3.2.so sshd 16088 boinc mem REG 3,9 968116 277204 /usr/lib/libcrypto.so.0.9.7 sshd 16088 boinc mem REG 3,9 422088 586497 /usr/lib/krb5/libkrb5.so.3.1 sshd 16088 boinc mem REG 3,9 76724 586489 /usr/lib/krb5/libk5crypto.so.3.0 sshd 16088 boinc mem REG 3,9 5468 228084 /lib/libcom_err.so.2.0 sshd 16088 boinc mem REG 3,9 1230116 228237 /lib/libc-2.3.2.so sshd 16088 boinc mem REG 3,9 55692 277030 /usr/lib/libpcsclite.so.0.0.1 sshd 16088 boinc mem REG 3,9 20276 277046 /usr/lib/libscconf.so.0.0.0 sshd 16088 boinc mem REG 3,9 101347 228847 /lib/libpthread-0.10.so sshd 16088 boinc mem REG 3,9 4132 586479 /usr/lib/krb5/libcom_err.so.3.0 sshd 16088 boinc mem CHR 1,5 162944 /dev/zero sshd 16088 boinc mem REG 3,9 4616 586434 /lib/security/pam_nologin.so sshd 16088 boinc mem REG 3,9 13616 586420 /lib/security/pam_cracklib.so sshd 16088 boinc mem REG 3,9 27480 276952 /usr/lib/libcrack.so.2.7 sshd 16088 boinc mem REG 3,9 13140 586428 /lib/security/pam_limits.so sshd 16088 boinc mem REG 3,9 42628 228839 /lib/libnss_files-2.3.2.so sshd 16088 boinc mem REG 3,9 39952 228844 /lib/libnss_nisplus-2.3.2.so sshd 16088 boinc mem REG 3,9 53296 586446 /lib/security/pam_unix.so sshd 16088 boinc mem REG 3,9 4260 586452 /lib/security/pam_warn.so sshd 16088 boinc mem REG 3,9 3220 586421 /lib/security/pam_deny.so sshd 16088 boinc mem REG 3,9 18184 228238 /lib/libcrypt-2.3.2.so sshd 16088 boinc mem CHR 1,5 162944 /dev/zero sshd 16088 boinc 0u CHR 1,3 162942 /dev/null sshd 16088 boinc 1u CHR 1,3 162942 /dev/null sshd 16088 boinc 2u CHR 1,3 162942 /dev/null sshd 16088 boinc 3u unix 0xc5de90c0 2153261 socket sshd 16088 boinc 4r FIFO 0,4 2153263 pipe sshd 16088 boinc 5w FIFO 0,4 2153263 pipe sshd 16088 boinc 6u IPv4 2153210 TCP 212247.rjo.virtua.com.br:ssh->200.149.245.172:4864 (ESTABLISHED) sshd 16088 boinc 7r FIFO 0,4 2504 pipe sshd 16088 boinc 8w FIFO 0,4 2504 pipe sshd 16088 boinc 9u CHR 5,2 164136 /dev/ptmx sshd 16088 boinc 10u CHR 5,2 164136 /dev/ptmx sshd 16088 boinc 11u CHR 5,2 164136 /dev/ptmx sshd 16088 boinc 21w CHR 1,3 162942 /dev/null bash 16089 boinc cwd DIR 3,9 4096 505732 /home/boinc/BOINC bash 16089 boinc rtd DIR 3,9 4096 2 / bash 16089 boinc txt REG 3,9 626348 16294 /bin/bash bash 16089 boinc mem REG 3,9 99663 232524 /lib/ld-2.3.2.so bash 16089 boinc mem REG 3,9 370 537779 /usr/lib/locale/en_US/LC_IDENTIFICATION bash 16089 boinc mem REG 3,9 28 538365 /usr/lib/locale/en_US/LC_MEASUREMENT bash 16089 boinc mem REG 3,9 64 538362 /usr/lib/locale/en_US/LC_TELEPHONE bash 16089 boinc mem REG 3,9 160 538366 /usr/lib/locale/en_US/LC_ADDRESS bash 16089 boinc mem REG 3,9 82 538363 /usr/lib/locale/en_US/LC_NAME bash 16089 boinc mem REG 3,9 39 538338 /usr/lib/locale/en_US/LC_PAPER bash 16089 boinc mem REG 3,9 57 765681 /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES bash 16089 boinc mem REG 3,9 291 538364 /usr/lib/locale/en_US/LC_MONETARY bash 16089 boinc mem REG 3,9 21499 248830 /usr/lib/locale/en_US/LC_COLLATE bash 16089 boinc mem REG 3,9 2456 538361 /usr/lib/locale/en_US/LC_TIME bash 16089 boinc mem REG 3,9 59 244374 /usr/lib/locale/en_US/LC_NUMERIC bash 16089 boinc mem REG 3,9 5500 977493 /usr/lib/gconv/ISO8859-1.so bash 16089 boinc mem REG 3,9 252784 228080 /lib/libncurses.so.5.2 bash 16089 boinc mem REG 3,9 9064 228276 /lib/libdl-2.3.2.so bash 16089 boinc mem REG 3,9 1230116 228237 /lib/libc-2.3.2.so bash 16089 boinc mem REG 3,9 178468 245165 /usr/lib/locale/en_US/LC_CTYPE bash 16089 boinc 0u CHR 136,3 5 /dev/pts/3 bash 16089 boinc 1u CHR 136,3 5 /dev/pts/3 bash 16089 boinc 2u CHR 136,3 5 /dev/pts/3 bash 16089 boinc 255u CHR 136,3 5 /dev/pts/3 boinc 16551 boinc cwd DIR 3,9 4096 505732 /home/boinc/BOINC boinc 16551 boinc rtd DIR 3,9 4096 2 / boinc 16551 boinc txt REG 3,9 2468096 505776 /home/boinc/BOINC/boinc boinc 16551 boinc mem DEL 0,3 31555584 /SYSV000933d7 boinc 16551 boinc mem REG 3,9 42628 228839 /lib/libnss_files-2.3.2.so boinc 16551 boinc mem REG 3,9 1230116 228237 /lib/libc-2.3.2.so boinc 16551 boinc mem REG 3,9 99663 232524 /lib/ld-2.3.2.so boinc 16551 boinc 0u CHR 136,3 5 /dev/pts/3 boinc 16551 boinc 1w REG 3,9 2044434 505771 /home/boinc/BOINC/stdoutdae.txt boinc 16551 boinc 2w REG 3,9 11791 505769 /home/boinc/BOINC/stderrdae.txt boinc 16551 boinc 3wW REG 3,9 0 505746 /home/boinc/BOINC/lockfile boinc 16551 boinc 4r DIR 3,9 4096 930768 /home/boinc/BOINC/slots boinc 16551 boinc 5u IPv4 2169693 TCP *:1043 (LISTEN) boinc 16551 boinc 6u IPv4 2169701 TCP 212247.rjo.virtua.com.br:1043->200.149.245.172:1550 (ESTABLISHED) rosetta_4 16553 boinc cwd DIR 3,9 4096 930773 /home/boinc/BOINC/slots/0 rosetta_4 16553 boinc rtd DIR 3,9 4096 2 / rosetta_4 16553 boinc txt REG 3,9 8323696 963718 /home/boinc/BOINC/projects/boinc.bakerlab.org_rosetta/rosetta_4.80_i686-pc-linux-gnu rosetta_4 16553 boinc mem DEL 0,3 31555584 /SYSV000933d7 rosetta_4 16553 boinc 0u CHR 136,3 5 /dev/pts/3 rosetta_4 16553 boinc 1w REG 3,9 153458 932310 /home/boinc/BOINC/slots/0/stdout.txt rosetta_4 16553 boinc 2w REG 3,9 587 932309 /home/boinc/BOINC/slots/0/stderr.txt rosetta_4 16553 boinc 3w REG 3,9 0 505746 /home/boinc/BOINC/lockfile rosetta_4 16553 boinc 4wW REG 3,9 0 932311 /home/boinc/BOINC/slots/0/boinc_lockfile rosetta_4 16553 boinc 5u IPv4 2169693 TCP *:1043 (LISTEN) rosetta_4 16553 boinc 6r FIFO 0,4 2169698 pipe rosetta_4 16553 boinc 7w FIFO 0,4 2169698 pipe rosetta_4 16554 boinc cwd DIR 3,9 4096 930773 /home/boinc/BOINC/slots/0 rosetta_4 16554 boinc rtd DIR 3,9 4096 2 / rosetta_4 16554 boinc txt REG 3,9 8323696 963718 /home/boinc/BOINC/projects/boinc.bakerlab.org_rosetta/rosetta_4.80_i686-pc-linux-gnu rosetta_4 16554 boinc mem DEL 0,3 31555584 /SYSV000933d7 rosetta_4 16554 boinc 0u CHR 136,3 5 /dev/pts/3 rosetta_4 16554 boinc 1w REG 3,9 153458 932310 /home/boinc/BOINC/slots/0/stdout.txt rosetta_4 16554 boinc 2w REG 3,9 587 932309 /home/boinc/BOINC/slots/0/stderr.txt rosetta_4 16554 boinc 3w REG 3,9 0 505746 /home/boinc/BOINC/lockfile rosetta_4 16554 boinc 4w REG 3,9 0 932311 /home/boinc/BOINC/slots/0/boinc_lockfile rosetta_4 16554 boinc 5u IPv4 2169693 TCP *:1043 (LISTEN) rosetta_4 16554 boinc 6r FIFO 0,4 2169698 pipe rosetta_4 16554 boinc 7w FIFO 0,4 2169698 pipe rosetta_4 16555 boinc cwd DIR 3,9 4096 930773 /home/boinc/BOINC/slots/0 rosetta_4 16555 boinc rtd DIR 3,9 4096 2 / rosetta_4 16555 boinc txt REG 3,9 8323696 963718 /home/boinc/BOINC/projects/boinc.bakerlab.org_rosetta/rosetta_4.80_i686-pc-linux-gnu rosetta_4 16555 boinc mem DEL 0,3 31555584 /SYSV000933d7 rosetta_4 16555 boinc 0u CHR 136,3 5 /dev/pts/3 rosetta_4 16555 boinc 1w REG 3,9 153458 932310 /home/boinc/BOINC/slots/0/stdout.txt rosetta_4 16555 boinc 2w REG 3,9 587 932309 /home/boinc/BOINC/slots/0/stderr.txt rosetta_4 16555 boinc 3w REG 3,9 0 505746 /home/boinc/BOINC/lockfile rosetta_4 16555 boinc 4w REG 3,9 0 932311 /home/boinc/BOINC/slots/0/boinc_lockfile rosetta_4 16555 boinc 5u IPv4 2169693 TCP *:1043 (LISTEN) rosetta_4 16555 boinc 6r FIFO 0,4 2169698 pipe rosetta_4 16555 boinc 7w FIFO 0,4 2169698 pipe crobertp [/home/boinc/BOINC] > Click signature for global team stats |
Message boards :
Rosetta@home Science :
7 WUs returned in error by computer 131283
©2024 University of Washington
https://www.bakerlab.org