Problems with madgraphs disk usage and file count
Hello
Setup is still the same as in my earlier questions: I am running madgraph 3.5.1 with the following proc card:
```
#******
#* MadGraph5_aMC@NLO *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 3.5.1 2023-07-11 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https:/
#* *
#******
#* *
#* Command File for MadGraph5_aMC@NLO *
#* *
#* run as ./bin/mg5_aMC filename *
#* *
#******
set group_subprocesses Auto
set ignore_
set low_mem_
set complex_mass_scheme False
set include_
set gauge unitary
set loop_optimized_
set loop_color_flows False
set max_npoint_
set default_
set max_t_for_channel 99
set zerowidth_tchannel True
set nlo_mixed_expansion True
set crash_on_error true
import model /home/users/
epo/data_
ived_haa
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
define l+ = e+ mu+
define nu = vl vl~
generate p p > t t~ x0, x0 > a a, (t > w+ b QNP=0, w+ > l+ nu QNP=0), \
(t~ > w- b~ QNP=0, w- > j j QNP=0)
output /tmp/tmpiym152v
```
using a docker container defined through:
```
FROM scailfin/
USER root
WORKDIR /root
# Do an initial run of MadGraph to build the initial templates.
RUN lhapdf install NNPDF23_
&& mg5_aMC \
&& rm py.py
RUN pip3 install jinja2
```
The model I am using is a slightly modified version of the Higgs Characterisation UFO model (http://
I need to run about 300 madgraph jobs that each generate about 100,000 events, 100 of these get 100,000 by doing a parameter scan (over 1,000 params) with 100 events for each run. In theory this all works fine, however I noticed that madgraph is generating output directories with a size of 130GB and a file count of up to 80000 for the scans. 130GB per job quickly exhausts the local storage of the cluster nodes, while the file count is too much for the external storage (shared between nodes) which is capped at 10 million files. And with too much I mean too much even if these files are only generated temporarily.
I know that madgraph has a cluster mode, however my current setup involves launching jobs with a workflow management system (https:/
My question would be: Is there a solution for madgraph to generate less files or use up less disk space? It also seems to be the case that madgraph generates way more events with pythia than it actually saves at the end? Why is that the case? Why does e.g. pythia not do that when I generate hard scattering + showering with it? Is there anything else you would suggest that I haven't thought of yet?
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Jona Ackerschott for more information if necessary.