Version 6.8 of Quantum Espresso has been benchmarked on Bebop on the Broadwell, Knights Landing, and Swing. These builds can be found in subfolders of /soft/espresso/6.8/{bdw,knl-omp, and swing}. There are sample bash scripts in the /soft/espresso/6.8/{bdw,knl-omp, and swing}/examples folders which can be submitted as batch jobs through slurm (sbatch) for each architecture.
On the PSIWAT benchmark, the KNL nodes are 40% faster than the Broadwell nodes. Since the cost of a KNL node is about 60% of a BDW node the KNL nodes are the most cost-efficient architecture to run Quantum Espresso on bebop.
Note that there was not enough memory to run the PSIWAT benchmark on one KNL node or one GPU.
Two GPUs on a swing node are five times faster than two KNL nodes and eight times faster than two BDW nodes. A pair of GPUs are faster throughput than sixteen BDW or KNL nodes for the PSIWAT benchmark.
Unfortunately, the number of GPUs is limited to the number of pools (k-points) in Quantum Espresso and the GPU memory. If your job dies on one GPU, try running on more GPUs to access more GPU memory. The GPUs are a good option to improve the throughput of your Quantum Espresso Simulations.