NAMD is different from most LCRC applications because it uses Charm++ rather than MPI to handle communications between processes. The most current builds of NAMD (v2.12) uses Charm++ over MPI. There are NAMD 2.12 builds for the Knights Landing (KNL) and Broadwell (BDW) nodes on Bebop. A build for the Materials Science Broadwell nodes on blues is also available.
You can improve the parallel scaling of NAMD on the KNL by using communication threads. This allows for asynchronous communications between processing elements. One core per node should not be used by NAMD to handle overhead from the operating system. James Philips (UIUC) described this in detail in a recent presentation at Charm++ workshop. This is a link to his presentation.


On the KNL communication threads improves the parallel scaling of NAMD. Phillips is recommending using 13 communication threads per node. This is a good recommendation for more than 32 nodes of Bebop’s KNL nodes. For more than 2 and less than 64 nodes, 7 or 9 communication threads per node give better performance. For less than 2 nodes, one communication thread is adequate. A sample script is provided in /soft/namd/2.12/bebop/knl.sh.
One communication thread per node gives the best performance on less than 4 BDW nodes. For four or more nodes, seven or nine communication threads give good parallel scaling for up to 32 BDW nodes. A sample script to run NAMD on the bebop nodes can be found at /soft/namd/2.12/bebop/bdw.sh.
The relative multinode performance of NAMD 2.12 on BDW and KNL nodes is about the same. KNL outperforms BDW on 1 or 2 nodes. BDW outperforms KNL nodes on four or more nodes.