t393_looprelax_round1_fullatom_relax_aaT0393...etc failures

Message boards : Number crunching : t393_looprelax_round1_fullatom_relax_aaT0393...etc failures

To post messages, you must log in.

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53472 - Posted: 30 May 2008, 23:07:07 UTC
Last modified: 30 May 2008, 23:09:27 UTC

Whats up with nearly 10gigs of diskspace being used for these files? If you look at my error reports in 1.24 you see i got something like 7 of these tasks, with the result being the same.

rosetta@home|Aborting task t393_looprelax_round1_fullatom_relax_aaT0393_2AHRA_1_0001_3559_941_0: exceeded disk limit: 96.25MB > 95.37MB

See my post in the 1.24 thread on the settings I had and changed on disk usage.

Are these huge files going to work now that I upped the use at most from 25 to 50% and allocated 10 gigs to use? These t393 tasks seem like a really huge task to use up so much disk speace or something.

I have 18 gig free on the drive that boinc is on.
ID: 53472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 53474 - Posted: 31 May 2008, 1:51:32 UTC - in response to Message 53472.  

Whats up with nearly 10gigs of diskspace being used for these files? If you look at my error reports in 1.24 you see i got something like 7 of these tasks, with the result being the same.

rosetta@home|Aborting task t393_looprelax_round1_fullatom_relax_aaT0393_2AHRA_1_0001_3559_941_0: exceeded disk limit: 96.25MB > 95.37MB


The WU isn't using 10GB, it's using around 100MB.

BOINC has several limits on disk usage, and the WU is ended if any one of those limits is exceeded. In this case the WU itself specified a limit of 95.37MB (100,000,000 Bytes), and the WU was ended when it exceeded that limit.
ID: 53474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53485 - Posted: 31 May 2008, 19:11:18 UTC - in response to Message 53474.  

Whats up with nearly 10gigs of diskspace being used for these files? If you look at my error reports in 1.24 you see i got something like 7 of these tasks, with the result being the same.

rosetta@home|Aborting task t393_looprelax_round1_fullatom_relax_aaT0393_2AHRA_1_0001_3559_941_0: exceeded disk limit: 96.25MB > 95.37MB


The WU isn't using 10GB, it's using around 100MB.

BOINC has several limits on disk usage, and the WU is ended if any one of those limits is exceeded. In this case the WU itself specified a limit of 95.37MB (100,000,000 Bytes), and the WU was ended when it exceeded that limit.


I am not that good with math. but here are the settings at the time.
18 gigs free, 10 gigs allocated to rosie/boinc. Leave at least .10 gigs free.
At the time I had it set for 25% of total disk space now its up to 50%. So at 25% that should have left about 4.5 gigs free if I am doing the math right. Your telling me that 100 mb does not fit within 4.5 gigs of disk space? Thats a farce if I ever saw one. Now I could be off my rocker, but it seems to me I had enough disk space. 100mb is just a fraction under 1 gig and i gave it 4 gig to work with. Whats up with that?
ID: 53485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 53488 - Posted: 31 May 2008, 20:06:18 UTC - in response to Message 53485.  

Hello Greg,

The error: <message> Maximum disk usage exceeded </message> can be caused by 2 different reasons:
1.The amount of disk space allowed by Boinc (defined by the user) is lower than the amount of disk space the application wants to use.
2.The amount of disk space allowed by the application (defined by the techs) is lower than the amount of disk space the application wants to run.
Since you defined 10 GB as free space for R@h to use, it is unlikely the Boinc max that causes some WU to error but more likely the application max that causes these WU's to error out.
As far as I know altering the Boinc disk settings do not change the max application disk space setting.

Hopefully the techs will look into the application max disk space setting.

Path7.
ID: 53488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53499 - Posted: 1 Jun 2008, 7:50:30 UTC - in response to Message 53488.  

ahh, thanks for that Path7. seems like some strange logic going on with some of the tasks.


Hello Greg,

The error: <message> Maximum disk usage exceeded </message> can be caused by 2 different reasons:
1.The amount of disk space allowed by Boinc (defined by the user) is lower than the amount of disk space the application wants to use.
2.The amount of disk space allowed by the application (defined by the techs) is lower than the amount of disk space the application wants to run.
Since you defined 10 GB as free space for R@h to use, it is unlikely the Boinc max that causes some WU to error but more likely the application max that causes these WU's to error out.
As far as I know altering the Boinc disk settings do not change the max application disk space setting.

Hopefully the techs will look into the application max disk space setting.

Path7.

ID: 53499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 53507 - Posted: 1 Jun 2008, 15:12:52 UTC

The logic is really not that strange. The WU specifies limits on the resources it will use, such as disk, memory, CPU cycles.
If the WU uses more than the limit, then something is very wrong and the WU should be terminated by BOINC.
The WU said it would never use more than 95.37MB unless something was wrong.
BOINC saw the WU using more than 95.37MB, so BOINC ended the WU.

As to what was wrong with the WU, I've looked at the slots directory of some of those t* WUs and there were a LOT of files like:

ClassicRelax_106_stage_2.mc_last.pdb
ClassicRelax_106_stage_2.mc_low.pdb
ClassicRelax_106_stage_2.pdb
ClassicRelax_106_stage_2.rng.state.gz

They were in groups of 4, with each group having different numbers. The *.pdb files were about 200k each, and the *.gz files were only 6k or so.

My guess is that the *.pdb files should have been deleted after being compressed into a *.gz, but they weren't. Thus they kept accumulating, using 600k of disk per group, until the disk usage limit was hit.
ID: 53507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53509 - Posted: 1 Jun 2008, 19:52:23 UTC - in response to Message 53507.  

The logic is really not that strange. The WU specifies limits on the resources it will use, such as disk, memory, CPU cycles.
If the WU uses more than the limit, then something is very wrong and the WU should be terminated by BOINC.
The WU said it would never use more than 95.37MB unless something was wrong.
BOINC saw the WU using more than 95.37MB, so BOINC ended the WU.

As to what was wrong with the WU, I've looked at the slots directory of some of those t* WUs and there were a LOT of files like:

ClassicRelax_106_stage_2.mc_last.pdb
ClassicRelax_106_stage_2.mc_low.pdb
ClassicRelax_106_stage_2.pdb
ClassicRelax_106_stage_2.rng.state.gz

They were in groups of 4, with each group having different numbers. The *.pdb files were about 200k each, and the *.gz files were only 6k or so.

My guess is that the *.pdb files should have been deleted after being
compressed into a *.gz, but they weren't. Thus they kept accumulating, using 600k of disk per group, until the disk usage limit was hit.


so some sort of programing glitch or some other program error.
nothing to do with my settings.
still bites to crunch and then get the thumbs down.
ID: 53509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : t393_looprelax_round1_fullatom_relax_aaT0393...etc failures



©2024 University of Washington
https://www.bakerlab.org