Minirosetta 3.14

Message boards : Number crunching : Minirosetta 3.14

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
darkestkhan

Send message
Joined: 16 Nov 09
Posts: 2
Credit: 4,886
RAC: 0
Message 70645 - Posted: 26 Jun 2011, 8:19:13 UTC

Debian GNU/Linux Sid/Experimental, BOINC 6.10.56
In the middle of night I got:

*** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.14_x86_64-pc-linux-gnu: double free or corruption (!prev): 0x1719e408 ***
======= Backtrace: =========
[0xa449b81]
[0xa44d69b]
[0xa411111]
[0x817a794]
[0xa427a5d]
[0xa38b0ca]
[0xa38b50a]
[0xf77b9400]
[0x80501d0]
[0xa45bafc]
[0x817b9ff]
[0x8049480]
[0xa4602de]
======= Memory map: ========
08048000-0a999000 r-xp 00000000 fe:02 3260875 /home/darkestkhan/BOINC/projects/boinc.bakerlab.org_rosetta/minirosetta_3.14_x86_64-pc-linux-gnu
0a999000-0a9a0000 rwxp 02950000 fe:02 3260875 /home/darkestkhan/BOINC/projects/boinc.bakerlab.org_rosetta/minirosetta_3.14_x86_64-pc-linux-gnu
0a9a0000-0ab5c000 rwxp 00000000 00:00 0
0bbe4000-1793c000 rwxp 00000000 00:00 0 [heap]
ef900000-ef9ae000 rwxp 00000000 00:00 0
ef9ae000-efa00000 ---p 00000000 00:00 0
efa6a000-efa6b000 ---p 00000000 00:00 0
efa6b000-f0f56000 rwxp 00000000 00:00 0
f111a000-f627a000 rwxp 00000000 00:00 0
f627a000-f758e000 rwxs 00000000 fe:02 1081610 /home/darkestkhan/BOINC/slots/0/boinc_minirosetta_0
f758e000-f758f000 ---p 00000000 00:00 0
f758f000-f7592000 rwxp 00000000 00:00 0
f7592000-f7594000 rwxs 00000000 fe:02 1081606 /home/darkestkhan/BOINC/slots/0/boinc_mmap_file
f7594000-f77b9000 rwxp 00000000 00:00 0
f77b9000-f77ba000 r-xp 00000000 00:00 0 [vdso]
ff9db000-ff9fc000 rw-p 00000000 00:00 0 [stack]
ID: 70645 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 70650 - Posted: 27 Jun 2011, 14:21:06 UTC

A couple of 3.14 client crashes:

https://boinc.bakerlab.org/rosetta/result.php?resultid=431970915
https://boinc.bakerlab.org/rosetta/result.php?resultid=431639167

ID: 70650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christoph

Send message
Joined: 10 Dec 05
Posts: 57
Credit: 1,512,386
RAC: 0
Message 70656 - Posted: 28 Jun 2011, 13:32:52 UTC

With the three stuck workunits I have at the moment, I can confirm that they seem to get stuck right after a checkpoint, maybe even at the checkpoint itself.

The three workunits are:
https://boinc.bakerlab.org/rosetta/result.php?resultid=432501911, last checkpoint: 8:38:03, cpu time: 8:38:04
https://boinc.bakerlab.org/rosetta/result.php?resultid=432531601, last checkpoint: 8:25:02, cpu time: 8:25:03
https://boinc.bakerlab.org/rosetta/result.php?resultid=432552233, last checkpoint: 6:24:59, cpu time: 6:24:59

I used Process Explorer to take a look at the threads and call stacks of one of those and posted them here, hope they are somewhat helpful.

Then I noticed that one of the threads is in suspended state, manually resumed it and the WU continued fine without problems. This has worked for all three workunits that I've tested. Looks to me like the worker thread isn't resumed after a checkpoint was done.
ID: 70656 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tex1954

Send message
Joined: 3 Apr 11
Posts: 9
Credit: 3,394,752
RAC: 7
Message 70669 - Posted: 1 Jul 2011, 12:35:22 UTC

I'm gettng a lot of computational errors lately, here's 3 in a row!

433392690 395543410 1 Jul 2011 9:29:55 UTC 1 Jul 2011 12:37:37 UTC Over Client error Compute error 592.18 5.51 ---
433361641 395530375 1 Jul 2011 6:18:15 UTC 1 Jul 2011 12:20:55 UTC Over Client error Compute error 519.42 4.84 ---
433327434 395499534 1 Jul 2011 2:02:12 UTC 1 Jul 2011 12:09:45 UTC Over Client error Compute error 612.05 5.70 ---

Sheesh...

:D

ID: 70669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70671 - Posted: 1 Jul 2011, 18:31:25 UTC

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=395439475

I was the wingman on this and this also died from a C++ error dealing with memory.

first person gets: process exited with code 193 (0xc1, -63) and terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
what(): Error in function boost::math::normal_distribution<d>::normal_distribution: Location parameter is nan, but must be finite!
SIGABRT: abort called

I get: - exit code -529697949 (0xe06d7363)
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB

ID: 70671 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 70672 - Posted: 2 Jul 2011, 0:27:51 UTC

Win64 client crash:

32.4: kd:x86> kp
ChildEBP RetAddr
01d8e18c 00411ef0 KERNELBASE!DebugBreak+0x2
01d8e1b0 00401d3f minirosetta_3_14_windows_x86_64!memcpy_s(void * dst = 0x00000000`1cf50020, unsigned int sizeInBytes = 0xfd0f2ef, void * src = 0x00000000`00000000, unsigned int count = 0xfd0f2e7)+0x2b [f:spvctoolscrt_bldself_x86crtsrcmemcpy_s.c @ 55]
01d8e1d4 00ae1128 minirosetta_3_14_windows_x86_64!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * _Right = <Memory access error>, unsigned int _Roff = <Memory access error>, unsigned int _Count = <Memory access error>)+0xbf [c:program files (x86)microsoft visual studio 8vcincludexstring @ 1049]
01d8e1f0 00ae1165 minirosetta_3_14_windows_x86_64!core::chemical::name_from_aa(core::chemical::AA aa = 0n0 (No matching enumerant))+0x68 [d:boinc_buildminirosetta_beta_3.14minisrccorechemicalaa.cc @ 253]
01d8e22c 00d5b843 minirosetta_3_14_windows_x86_64!core::chemical::operator<<(class std::basic_ostream<char,std::char_traits<char> > * os = 0x00000000`010d0820, core::chemical::AA * aa = 0x00000000`00000000)+0x35 [d:boinc_buildminirosetta_beta_3.14minisrccorechemicalaa.cc @ 245]
01d8e2c0 00d5bcb5 minirosetta_3_14_windows_x86_64!core::fragment::make_pose_from_sequence_(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > sequence = class std::basic_string<char,std::char_traits<char>,std::allocator<char> >, class core::chemical::ResidueTypeSet * residue_set = 0x00000000`00000000, class core::pose::Pose * pose = 0x00000000`167f1548)+0x113 [d:boinc_buildminirosetta_beta_3.14minisrccorefragmentframe.cc @ 68]
01d8e2f4 0085445f minirosetta_3_14_windows_x86_64!core::fragment::Frame::fragment_as_pose(unsigned int frag_num = 0, class core::pose::Pose * pose = 0x00000000`00000000, class utility::pointer::access_ptr<core::chemical::ResidueTypeSet const > restype_set = class utility::pointer::access_ptr<core::chemical::ResidueTypeSet const >)+0x35 [d:boinc_buildminirosetta_beta_3.14minisrccorefragmentframe.cc @ 402]
01d8e328 00854795 minirosetta_3_14_windows_x86_64!protocols::basic_moves::GunnCost::compute_gunn(class core::fragment::Frame * frame = 0x00000000`00000000, unsigned int frag_num = 0, struct protocols::basic_moves::GunnTuple * data = 0x00000000`6e6e7547)+0xff [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsbasic_movesgunncost.cc @ 95]
01d8e478 00a50530 minirosetta_3_14_windows_x86_64!protocols::basic_moves::GunnCost::score(class core::fragment::Frame * frame = 0x00000000`167cd0f8, class core::pose::Pose * pose = 0x00000000`01d8eb18, class utility::vector1<double,std::allocator<double> > * scores = 0x00000000`01d8e4b0)+0x2c5 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsbasic_movesgunncost.cc @ 77]
01d8e4ec 0094b91f minirosetta_3_14_windows_x86_64!protocols::nonlocal::SmoothPolicy::choose(class core::fragment::Frame * frame = 0x00000000`167cd0f8, class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x60 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalsmoothpolicy.cc @ 50]
01d8e7f0 009b556c minirosetta_3_14_windows_x86_64!protocols::nonlocal::SingleFragmentMover::apply(class core::pose::Pose * pose = 0x00000000`00000000)+0x15f [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalsinglefragmentmover.cc @ 116]
01d8e97c 00403cc2 minirosetta_3_14_windows_x86_64!protocols::nonlocal::RationalMonteCarlo::apply(class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x7c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalrationalmontecarlo.cc @ 62]
01d8e9b4 009b51ec minirosetta_3_14_windows_x86_64!std::basic_ostream<char,std::char_traits<char> >::put(char _Ch = <Memory access error>)+0x102 [c:program files (x86)microsoft visual studio 8vcincludeostream @ 528]
01d8e9f0 00675f4c minirosetta_3_14_windows_x86_64!protocols::nonlocal::BrokenBase::apply(class core::pose::Pose * pose = <Memory access error>)+0x20c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalbrokenbase.cc @ 68]
01d8eafc 005d137b minirosetta_3_14_windows_x86_64!protocols::nonlocal::NonlocalAbinitio::apply(class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x45c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalnonlocalabinitio.cc @ 240]
01d8eca4 005d1cd4 minirosetta_3_14_windows_x86_64!protocols::jd2::JobDistributor::go_main(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0xa2b [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2jobdistributor.cc @ 376]
01d8ecc4 00837b46 minirosetta_3_14_windows_x86_64!protocols::jd2::JobDistributor::go(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0x44 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2jobdistributor.cc @ 201]
01d8ecf8 005de323 minirosetta_3_14_windows_x86_64!protocols::jd2::BOINCJobDistributor::go(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0xb6 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2boincjobdistributor.cc @ 96]
01d8ed48 0040579d minirosetta_3_14_windows_x86_64!protocols::nonlocal::NonlocalAbinitio_main(void * __formal = 0x00000000`00000000)+0x223 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalnonlocalabinitiomain.cc @ 80]
01d8eedc 00405bf5 minirosetta_3_14_windows_x86_64!main(int argc = 0n25, char ** argv = 0x00000000`01d8eef4)+0xe1d [d:boinc_buildminirosetta_beta_3.14minisrcappspublicboincminirosetta.cc @ 220]
01d8fef0 004186b7 minirosetta_3_14_windows_x86_64!WinMain(struct HINSTANCE__ * hInst = 0x00000000`76ca33ca, struct HINSTANCE__ * hPrevInst = 0x00000000`7efde000, char * Args = 0x00000000`01d8ffd4 "???", int WinMode = 0n2010029778)+0x25 [d:boinc_buildminirosetta_beta_3.14minisrcappspublicboincminirosetta.cc @ 292]
01d8ff88 76ca33ca minirosetta_3_14_windows_x86_64!__tmainCRTStartup(void)+0x177 [f:spvctoolscrt_bldself_x86crtsrccrt0.c @ 324]
01d8ff94 77ce9ed2 kernel32!BaseThreadInitThunk+0xe
01d8ffd4 77ce9ea5 ntdll_77cb0000!__RtlUserThreadStart+0x70
01d8ffec 00000000 ntdll_77cb0000!_RtlUserThreadStart+0x1b

32.4: kd:x86> u minirosetta_3_14_windows_x86_64!memcpy_s minirosetta_3_14_windows_x86_64!memcpy_s+0x2b
minirosetta_3_14_windows_x86_64!memcpy_s [f:spvctoolscrt_bldself_x86crtsrcmemcpy_s.c @ 47]:
00000000`00411ec5 55 push ebp
00000000`00411ec6 8bec mov ebp,esp
00000000`00411ec8 56 push esi
00000000`00411ec9 8b7514 mov esi,dword ptr [ebp+14h]
00000000`00411ecc 57 push edi
00000000`00411ecd 33ff xor edi,edi
00000000`00411ecf 3bf7 cmp esi,edi
00000000`00411ed1 7504 jne minirosetta_3_14_windows_x86_64!memcpy_s+0x12 (00411ed7)
00000000`00411ed3 33c0 xor eax,eax
00000000`00411ed5 eb65 jmp minirosetta_3_14_windows_x86_64!memcpy_s+0x77 (00411f3c)
00000000`00411ed7 397d08 cmp dword ptr [ebp+8],edi
00000000`00411eda 751b jne minirosetta_3_14_windows_x86_64!memcpy_s+0x32 (00411ef7)
00000000`00411edc e81a150000 call minirosetta_3_14_windows_x86_64!_errno (004133fb)
00000000`00411ee1 6a16 push 16h
00000000`00411ee3 5e pop esi
00000000`00411ee4 8930 mov dword ptr [eax],esi
00000000`00411ee6 57 push edi
00000000`00411ee7 57 push edi
00000000`00411ee8 57 push edi
00000000`00411ee9 57 push edi
00000000`00411eea 57 push edi
00000000`00411eeb e84c210000 call minirosetta_3_14_windows_x86_64!_invalid_parameter (0041403c)


I have a mini kernel dump if anyone needs it.
ID: 70672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70673 - Posted: 2 Jul 2011, 5:59:42 UTC - in response to Message 70672.  

Win64 client crash:

32.4: kd:x86> kp
ChildEBP RetAddr
01d8e18c 00411ef0 KERNELBASE!DebugBreak+0x2
01d8e1b0 00401d3f minirosetta_3_14_windows_x86_64!memcpy_s(void * dst = 0x00000000`1cf50020, unsigned int sizeInBytes = 0xfd0f2ef, void * src = 0x00000000`00000000, unsigned int count = 0xfd0f2e7)+0x2b [f:spvctoolscrt_bldself_x86crtsrcmemcpy_s.c @ 55]
01d8e1d4 00ae1128 minirosetta_3_14_windows_x86_64!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * _Right = <Memory access error>, unsigned int _Roff = <Memory access error>, unsigned int _Count = <Memory access error>)+0xbf [c:program files (x86)microsoft visual studio 8vcincludexstring @ 1049]
01d8e1f0 00ae1165 minirosetta_3_14_windows_x86_64!core::chemical::name_from_aa(core::chemical::AA aa = 0n0 (No matching enumerant))+0x68 [d:boinc_buildminirosetta_beta_3.14minisrccorechemicalaa.cc @ 253]
01d8e22c 00d5b843 minirosetta_3_14_windows_x86_64!core::chemical::operator<<(class std::basic_ostream<char,std::char_traits<char> > * os = 0x00000000`010d0820, core::chemical::AA * aa = 0x00000000`00000000)+0x35 [d:boinc_buildminirosetta_beta_3.14minisrccorechemicalaa.cc @ 245]
01d8e2c0 00d5bcb5 minirosetta_3_14_windows_x86_64!core::fragment::make_pose_from_sequence_(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > sequence = class std::basic_string<char,std::char_traits<char>,std::allocator<char> >, class core::chemical::ResidueTypeSet * residue_set = 0x00000000`00000000, class core::pose::Pose * pose = 0x00000000`167f1548)+0x113 [d:boinc_buildminirosetta_beta_3.14minisrccorefragmentframe.cc @ 68]
01d8e2f4 0085445f minirosetta_3_14_windows_x86_64!core::fragment::Frame::fragment_as_pose(unsigned int frag_num = 0, class core::pose::Pose * pose = 0x00000000`00000000, class utility::pointer::access_ptr<core::chemical::ResidueTypeSet const > restype_set = class utility::pointer::access_ptr<core::chemical::ResidueTypeSet const >)+0x35 [d:boinc_buildminirosetta_beta_3.14minisrccorefragmentframe.cc @ 402]
01d8e328 00854795 minirosetta_3_14_windows_x86_64!protocols::basic_moves::GunnCost::compute_gunn(class core::fragment::Frame * frame = 0x00000000`00000000, unsigned int frag_num = 0, struct protocols::basic_moves::GunnTuple * data = 0x00000000`6e6e7547)+0xff [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsbasic_movesgunncost.cc @ 95]
01d8e478 00a50530 minirosetta_3_14_windows_x86_64!protocols::basic_moves::GunnCost::score(class core::fragment::Frame * frame = 0x00000000`167cd0f8, class core::pose::Pose * pose = 0x00000000`01d8eb18, class utility::vector1<double,std::allocator<double> > * scores = 0x00000000`01d8e4b0)+0x2c5 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsbasic_movesgunncost.cc @ 77]
01d8e4ec 0094b91f minirosetta_3_14_windows_x86_64!protocols::nonlocal::SmoothPolicy::choose(class core::fragment::Frame * frame = 0x00000000`167cd0f8, class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x60 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalsmoothpolicy.cc @ 50]
01d8e7f0 009b556c minirosetta_3_14_windows_x86_64!protocols::nonlocal::SingleFragmentMover::apply(class core::pose::Pose * pose = 0x00000000`00000000)+0x15f [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalsinglefragmentmover.cc @ 116]
01d8e97c 00403cc2 minirosetta_3_14_windows_x86_64!protocols::nonlocal::RationalMonteCarlo::apply(class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x7c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalrationalmontecarlo.cc @ 62]
01d8e9b4 009b51ec minirosetta_3_14_windows_x86_64!std::basic_ostream<char,std::char_traits<char> >::put(char _Ch = <Memory access error>)+0x102 [c:program files (x86)microsoft visual studio 8vcincludeostream @ 528]
01d8e9f0 00675f4c minirosetta_3_14_windows_x86_64!protocols::nonlocal::BrokenBase::apply(class core::pose::Pose * pose = <Memory access error>)+0x20c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalbrokenbase.cc @ 68]
01d8eafc 005d137b minirosetta_3_14_windows_x86_64!protocols::nonlocal::NonlocalAbinitio::apply(class core::pose::Pose * pose = 0x00000000`01d8eb18)+0x45c [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalnonlocalabinitio.cc @ 240]
01d8eca4 005d1cd4 minirosetta_3_14_windows_x86_64!protocols::jd2::JobDistributor::go_main(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0xa2b [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2jobdistributor.cc @ 376]
01d8ecc4 00837b46 minirosetta_3_14_windows_x86_64!protocols::jd2::JobDistributor::go(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0x44 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2jobdistributor.cc @ 201]
01d8ecf8 005de323 minirosetta_3_14_windows_x86_64!protocols::jd2::BOINCJobDistributor::go(class utility::pointer::owning_ptr<protocols::moves::Mover> mover = class utility::pointer::owning_ptr<protocols::moves::Mover>)+0xb6 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsjd2boincjobdistributor.cc @ 96]
01d8ed48 0040579d minirosetta_3_14_windows_x86_64!protocols::nonlocal::NonlocalAbinitio_main(void * __formal = 0x00000000`00000000)+0x223 [d:boinc_buildminirosetta_beta_3.14minisrcprotocolsnonlocalnonlocalabinitiomain.cc @ 80]
01d8eedc 00405bf5 minirosetta_3_14_windows_x86_64!main(int argc = 0n25, char ** argv = 0x00000000`01d8eef4)+0xe1d [d:boinc_buildminirosetta_beta_3.14minisrcappspublicboincminirosetta.cc @ 220]
01d8fef0 004186b7 minirosetta_3_14_windows_x86_64!WinMain(struct HINSTANCE__ * hInst = 0x00000000`76ca33ca, struct HINSTANCE__ * hPrevInst = 0x00000000`7efde000, char * Args = 0x00000000`01d8ffd4 "???", int WinMode = 0n2010029778)+0x25 [d:boinc_buildminirosetta_beta_3.14minisrcappspublicboincminirosetta.cc @ 292]
01d8ff88 76ca33ca minirosetta_3_14_windows_x86_64!__tmainCRTStartup(void)+0x177 [f:spvctoolscrt_bldself_x86crtsrccrt0.c @ 324]
01d8ff94 77ce9ed2 kernel32!BaseThreadInitThunk+0xe
01d8ffd4 77ce9ea5 ntdll_77cb0000!__RtlUserThreadStart+0x70
01d8ffec 00000000 ntdll_77cb0000!_RtlUserThreadStart+0x1b

32.4: kd:x86> u minirosetta_3_14_windows_x86_64!memcpy_s minirosetta_3_14_windows_x86_64!memcpy_s+0x2b
minirosetta_3_14_windows_x86_64!memcpy_s [f:spvctoolscrt_bldself_x86crtsrcmemcpy_s.c @ 47]:
00000000`00411ec5 55 push ebp
00000000`00411ec6 8bec mov ebp,esp
00000000`00411ec8 56 push esi
00000000`00411ec9 8b7514 mov esi,dword ptr [ebp+14h]
00000000`00411ecc 57 push edi
00000000`00411ecd 33ff xor edi,edi
00000000`00411ecf 3bf7 cmp esi,edi
00000000`00411ed1 7504 jne minirosetta_3_14_windows_x86_64!memcpy_s+0x12 (00411ed7)
00000000`00411ed3 33c0 xor eax,eax
00000000`00411ed5 eb65 jmp minirosetta_3_14_windows_x86_64!memcpy_s+0x77 (00411f3c)
00000000`00411ed7 397d08 cmp dword ptr [ebp+8],edi
00000000`00411eda 751b jne minirosetta_3_14_windows_x86_64!memcpy_s+0x32 (00411ef7)
00000000`00411edc e81a150000 call minirosetta_3_14_windows_x86_64!_errno (004133fb)
00000000`00411ee1 6a16 push 16h
00000000`00411ee3 5e pop esi
00000000`00411ee4 8930 mov dword ptr [eax],esi
00000000`00411ee6 57 push edi
00000000`00411ee7 57 push edi
00000000`00411ee8 57 push edi
00000000`00411ee9 57 push edi
00000000`00411eea 57 push edi
00000000`00411eeb e84c210000 call minirosetta_3_14_windows_x86_64!_invalid_parameter (0041403c)


I have a mini kernel dump if anyone needs it.



please don't paste such a long dump, just post a link to the task either in URL form or just plain text. or do like I do, just summarize and post a few of the most relevant pieces. long text dumps like this clog the thread. the team can look at the link you posted for further information.
ID: 70673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 70674 - Posted: 2 Jul 2011, 6:24:12 UTC

I'll post a few WUs that have failed:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=393616699

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=394975566

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=390946185

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=395576024


Hope it helps.
ID: 70674 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 70680 - Posted: 2 Jul 2011, 22:06:18 UTC

This one failed after 13min.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=395597786

ggc_boinc_rosetta_cm_nonlocal_sounier_IGNORE_THE_REST_28252_8806_1

Part of result log.

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>


# cpu_run_time_pref: 14400
cannot find aminoacid SIGSEGV: segmentation violation
Stack trace (20 frames):
[0xa38f2d7]
[0xf77f1400]
[0xa3f14cf]
[0xa05a145]
[0x92e2b97]
[0x92e2da7]
[0x8137d29]
[0x8138bce]
[0x9131838]
[0x912e0e6]
[0x9127976]
[0x9284619]
[0x8abb3dc]
[0x815fa96]
[0x8161a79]
[0x915ffa5]
[0x81041f8]
[0x8054421]
[0xa41f118]
[0x8048131]

Exiting...

</stderr_txt>
]]>

ID: 70680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ComfortablyNumb

Send message
Joined: 6 Jul 07
Posts: 8
Credit: 658,196
RAC: 0
Message 70685 - Posted: 4 Jul 2011, 16:24:22 UTC

Nothing but computation error's lately. Anybody else having this? I have reached the mavimum number of results(92) at noon. Mini 3.14
ID: 70685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 70687 - Posted: 4 Jul 2011, 19:46:17 UTC - in response to Message 70685.  

Nothing but computation error's lately. Anybody else having this? I have reached the mavimum number of results(92) at noon. Mini 3.14


That's really odd. All of my PCs don't have any errors, except for one or 2 every 30-40 WUs... is your PC running stable? Any overclocking?

If you overclock, run this benchmark: http://www.xtremesystems.org/forums/showthread.php?201670-LinX-A-simple-Linpack-interface

Set the runs at 20, and wait. If no error, then your PC is rock stable and it's a WU problem. If your PC is just a bit unstable, Linx should pick it up.
ID: 70687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 70701 - Posted: 9 Jul 2011, 5:51:41 UTC

Hi.

Mine seems to have finished O.K. but i got no credit for it, see what happened to the other two below. Any chance it can be fixed and get the credits for it.?


casd_rhodopsin_boinc_1l0mA_53.abrelax_cs_frags.pctid_0.25.tmscore_0.66390._abrelax_cs_frags_tex_IGNORE_THE_REST_27887_8592

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=394828271

27 Jun 2011 20:41:20 UTC__7 Jul 2011 20:41:20 UTC__Over__No reply__New__0.00
7 Jul 2011 20:49:24 UTC__7 Jul 2011 21:59:42 UTC__Over__Client error__Compute error__0.00
Mine = 7 Jul 2011 22:06:17 UTC__9 Jul 2011 5:37:38 UTC__Over__Validate error__Done__13,837.73

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 7- 8 22:53:55:] :: BOINC:: Initializing ... ok.
[2011- 7- 8 22:53:55:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.



# cpu_run_time_pref: 14400
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_1 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_2 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_3 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_4 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_5 ... success!
Continuing computation from checkpoint: chk_S_00003_FragmentSampler__stage_3_iter1_6 ... success!
======================================================
DONE :: 21 starting structures 13837.5 cpu seconds
This process generated 21 decoys from 21 attempts
======================================================
BOINC :: WS_max 5.1645e+120

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state___Invalid
Claimed credit___102.49362635862
Granted credit___0
application version: 3.14

ID: 70701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rabinovitch
Avatar

Send message
Joined: 28 Apr 07
Posts: 28
Credit: 5,439,728
RAC: 0
Message 70704 - Posted: 10 Jul 2011, 5:49:05 UTC
Last modified: 10 Jul 2011, 5:50:03 UTC

Hi all!
I have just installed new "inkarnation" of Kubuntu 10.10 amd64, BOINC and x32-libs proposed by boinc.berkeley.edu. All my CPU tasks (rosetta, ralph and QMC) are exiting with "Compute error" after several hours of processing. Rosetta's application says different things, for example:

Task 435109856

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
[2011- 7- 9 18:58: 0:] :: BOINC:: Initializing ... ok.
[2011- 7- 9 18:58: 0:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/ECH19_looprem_verif_long404.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 86400
FILE_LOCK::unlock(): close failed.: Bad file descriptor
[2011- 7- 9 19:27:26:] :: BOINC:: Initializing ... ok.
[2011- 7- 9 19:27:26:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/ECH19_looprem_verif_long404.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 86400

</stderr_txt>
]]>

Task 435105624

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Input file minirosetta_database_rev42272.zip missing or invalid: -120
</message>
]]>

For QMC tasks there is always "process got signal 11" message.

What can the matter be? Einstein's CUDA WUs are working good enough.
From Siberia with love!
ID: 70704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 70705 - Posted: 10 Jul 2011, 15:30:51 UTC
Last modified: 10 Jul 2011, 15:33:10 UTC

Looks like it's having a problem unzipping the minirosetta_database_rev42272.zip file mentioned. You might just download it to a sandbox and see if you can unzip it from the command line. If not, perhaps there is now something inconsistent about your setup that is interfering with the unzip. Here's a direct link to download that specific file outside the BOINC client (just use wget), so you can tinker with it.

https://boinc.bakerlab.org/rosetta/download/minirosetta_database_rev42272.zip

Going through the steps manually might also uncover any network/firewall or antivirus issues that may be effecting things as well.
Rosetta Moderator: Mod.Sense
ID: 70705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 70708 - Posted: 11 Jul 2011, 12:03:04 UTC

I have added a new machine to my account and have several days worth of WU's> My question is the WU's start processing and at some point and time (generally when they are above 50% complete they will change to waiting to run and other WU's start up and say running high priority. What would make this happen?
ID: 70708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 70709 - Posted: 11 Jul 2011, 13:47:56 UTC - in response to Message 70708.  

I have added a new machine to my account and have several days worth of WU's> My question is the WU's start processing and at some point and time (generally when they are above 50% complete they will change to waiting to run and other WU's start up and say running high priority. What would make this happen?


A typical result of connecting to a BOINC project that overestimates how many workunits a new computer can complete by the deadline.
ID: 70709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 70710 - Posted: 11 Jul 2011, 15:14:26 UTC - in response to Message 70709.  

I have added a new machine to my account and have several days worth of WU's> My question is the WU's start processing and at some point and time (generally when they are above 50% complete they will change to waiting to run and other WU's start up and say running high priority. What would make this happen?


A typical result of connecting to a BOINC project that overestimates how many workunits a new computer can complete by the deadline.



Thanks
ID: 70710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 70711 - Posted: 11 Jul 2011, 16:44:26 UTC
Last modified: 11 Jul 2011, 16:46:53 UTC

You're welcome.

BOINC often corrects that problem after it has completed enough workunits from that project to make a better estimate of how long each workunit should run on that computer.
ID: 70711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 70726 - Posted: 15 Jul 2011, 8:50:01 UTC
Last modified: 15 Jul 2011, 8:51:24 UTC

Like several people in here, I'm getting the occasional hang with the new 3.14 client. Hopefully I'll be able to come back and edit this to add further crashed WU's, since I get about one a day from the farm I have here.

To get things started ...

ilv_fgf2_all_boinc_1n4kA_124.nonlocal.pctid_0.19.tmscore_0.62808._nonlocal_tex_IGNORE_THE_REST_27534_15684_0

-- Edit -- missed a period in the WU name.
ID: 70726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Strait

Send message
Joined: 10 Dec 05
Posts: 2
Credit: 3,633,547
RAC: 947
Message 70727 - Posted: 15 Jul 2011, 11:36:52 UTC

Recently I have been getting a number of Pop-ups that say:
===========================================
Microsoft Visual C++ Runtime Library

Runtime Error!

Program: ...kerlab.org_rosettaminirosetta_3.14_windows_intelx86.exe

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
===========================================

I am running Microsoft Windows XP
Professional x86 Edition, Service Pack 3, (05.01.2600.00)
On a GenuineIntel
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [x86 Family 6 Model 26 Stepping 5]








ID: 70727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Minirosetta 3.14



©2024 University of Washington
https://www.bakerlab.org