High-Performance Computing with MSC Nastran

Discover the powerful solvers available in MSC Nastran

Contact us

Vasiliki Tsianika, PhD, MSC Nastran Product Manager
Mark Robinson, Senior Technical Specialist

MSC Nastran is a powerful finite element analysis (FEA) software that is foundational to engineering simulation. It has been used and validated by experts in structural analysis for half a century and is renowned for robustness, accuracy, and the scope of engineering challenges it’s able to address.


Executive summary

  • Learn how High-Performance Computing (HPC) strategies will enhance your simulation and analysis performance with MSC Nastran
  • Explore the available solvers for all analysis types, including statics, eigenvalues, dynamics, and nonlinear, so you can select the most suitable solver for your specific simulation needs. 
  • Draw insights from the experience of other MSC Nastran users and Hexagon’s experts to achieve optimal parallel performance while minimizing the costs of reading and writing to the disk. 
  • Integrate HPC expertise with a comprehensive understanding of MSC Nastran’s solvers to significantly speed up simulations, minimize costs and boost the efficiency of different types of analyses.

For further details on high-performance computing options in MSC Nastran, please consult the MSC Nastran 2023.4 HPC User’s Guide.


1. Solver classification

MSC Nastran includes a wide range of solvers. Selecting the ideal solver for your analysis depends on the specific engineering challenge you’re addressing. MSC Nastran solvers can be categorized into three main groups: direct solvers, iterative solvers, and eigenvalue solvers. 

For example, a linear static analysis calculates displacements and other result quantities like stresses. The performance of the corresponding static simulation is dominated by two operations: solving the matrix problem and writing the requested data to the output files.  To solve the matrix problem, we can use either a direct or an iterative method where the input is the stiffness matrix and the loads, and the output are the displacements. 

Direct solvers 
Direct solvers rely on LDLT decomposition and are widely used in structural analysis to derive solutions insensitive to the numerical characteristics of the stiffness matrix; they operate via a two-step process. Initially, the symmetric stiffness matrix is factorized into a lower triangular matrix. Subsequently, a forward elimination and backward substitution (FBS) is performed to solve the resultant system, and this is often denoted as the FBS part of the solution algorithm. 

Direct methods exploit the sparsity inherent in the stiffness matrix (Figure 1). A sparse matrix just means there are many zero terms. Multi-frontal algorithms take advantage of the matrix sparsity to reduce computational times and memory requirements. 

Matrix factorization generally comprises 80-90% of the total solve time, while the FBS component generally consumes the remaining 10-20%. There are three direct methods available in MSC Nastran:

  • The MSC Sparse Direct Solver (MSCLDL) 
  • The Pardiso solver (PRDLDL)
  • And the MUMPS solver (MUMPS and MUMPSBLR)
Figure 1: Sparse Matrix in two dimensions. The non-zero terms are shown in black. (Wikipedia)

Figure 1: Sparse Matrix in two dimensions. The non-zero terms are shown in black. (Wikipedia)

The MSCLDL solver is the original sparse direct solver of MSC Nastran, and although it is designed to run in very limited memory it does have limited parallel scalability. On the other hand, both the Pardiso and MUMPS solvers consume between 5 to 12 times more memory than MSCLDL, depending on the model, but can exhibit greater performance especially when utilized with Shared Memory Parallelism (SMP).

Iterative solvers 

Iterative solvers are another option for solving linear equations. Iterative solvers work by reducing the error in an approximate solution using an iterative process, eventually leading to convergence within an acceptable tolerance. Preconditioners are used to compute an approximate solution as a good starting point for the iterative process which is progressively refined until convergence is achieved to an acceptably small value. 

Iterative methods typically rely on techniques like the Conjugate Gradient method or the GMRES method. Although these methods can be much faster and consume much less memory than direct solvers, they generally only perform well on certain types of problems, such as solid element-dominated models with few load cases. In MSC Nastran, the CASI iterative solver performs best on general problems, with the MSC iterative solver for more awkward matrix topologies.

Eigenvalue solvers 

In linear dynamics analysis where matrix factorization of the dynamic stiffness needs to be carried out more than once, although a solution to the problem of the physical variables (degrees of freedom) is possible, it is generally undesirable as it leads to extended run times.  Modal reduction is often used to transform the system's physical variables to a set of modal ones, requiring the computation of an eigenvalue problem to obtain the eigenvalues (and natural frequencies) and eigenvectors (mode shapes) of the system.  These intrinsic properties of the system help the engineer understand its behaviour and aid in the design and assessment of system performance under varying conditions. Over the years, multiple methods for solving the eigenvalue problem have been added to MSC Nastran including the inverse iteration, Householder, Givens, ACMS and Lanczos methods. Today, the two most common methods are: 

  • Lanczos method
  • ACMS (Automated Component Mode Synthesis)

The Lanczos method only makes the calculations necessary to find the roots requested. It uses Sturm sequence logic to ensure that all modes are found. When extracting a relatively small number of eigenvalues, the bulk of the computation time for Lanczos is spent performing symmetric factorization, so that optimal compute considerations are virtually the same as for linear static analysis in MSC Nastran.

ACMS is a multi-level modal reduction technique that produces a close approximation to a normal mode’s solution.  It is optimal for analyses requiring a significant number of vibration modes, and also for jobs requiring relatively few vibration modes of very large models. Dynamic response in the frequency domain using modal reduction is even faster when combining ACMS with FastFR. FastFR is an acceleration method for modal frequency response runs that performs well for systems that have a larger number of modes and/or a large number of excitation frequencies. 

Next, we delve into further details regarding memory, hardware, and parallel settings to guarantee the solver operates at its optimal performance level.

 

2. Memory considerations

Typical structural simulations carried out using MSC Nastran include linear static, normal modes, dynamic response and nonlinear analyses. When the numeric operations on the system matrices can be accommodated within available memory, how quickly the problem can be solved is primarily influenced by memory bandwidth. However, it is common for the size of the problem to exceed available memory capacity. In that case, input/output (I/O) performance and memory capacity become critical determinants of performance.  Although this scenario is frequently encountered in MSC Nastran, there are instances where factors like flops (floating point operations) per core or more cores can be equally advantageous.

Random Access Memory (RAM)

Most of the direct solvers allow problems to be solved in-core or out-of-core. 

  1. In-core processing
    In-core processing means the entire numerical problem can fit into the memory (RAM) allocated to solve the problem.  The solution is fast because access to data in memory is much quicker than retrieving it from some other storage medium.
  2. Out-of-core processing
    Out-of-core processing means there is only enough RAM allocated to fit part of the problem into memory, so the solution process is carried out in parts.  When a new part of the problem is solved, data must be moved out of memory to some storage medium to make room for the next part of the problem.  Depending on the speed of the chosen storage medium, the movement of data will take more or less time, but it is always much slower than accessing data directly in memory.

The speed of data access to storage media has increased significantly over the last few years. When correctly configured, the delay associated with writing to storage media can be largely mitigated, but solving the problem out-of-core in parts will always have a performance penalty.  Irrespective of the way the problem is solved, the solution process also generates temporary data, which is only needed to manage the solution process.  If enough memory is available this temporary data may also be stored in a buffer in memory. If there is not enough memory, it must be written to some other storage medium. The memory allocated to any significantly sized problem will always affect performance.  The operating system has its own I/O buffering logic, but it is always better to use the buffering system provided by MSC Nastran via its buffer pool.

To simply illustrate how MSC Nastran uses memory, let’s ignore the small amount of memory allocated by MSC Nastran to its executive system.  There are then essentially two significant regions of memory that MSC Nastran sets up when a job is run.  The principal region is called open core and it is also referred to as HICORE.  The other significant region is allocated to buffer pooling, and it gets the name BPOOL.  

Figure 2 illustrates how MSC Nastran allocates memory for various tasks. When a job is submitted, MSC Nastran uses a portion of the total physical available RAM on the machine, as specified by the user. If "mem=max" is used during submission, MSC Nastran will use 50% of the physical memory available. From this amount of memory, a portion is allocated to the buffer pooling system BPOOL and another portion to the solver HICORE.

  • BPOOL uses an internal caching algorithm. Increasing the size of BPOOL leads to fewer reads and writes to the disk-type storage media resulting in reduced input/output operations. This is particularly beneficial for large problems with high I/O demands and can provide significant gains in elapsed time for computer systems with poor I/O configurations.  On the other hand, the larger BPOOL gets, the smaller HICORE can be.
  • HICORE represents the memory allocated for solving the numerical problem at hand.
Figure 2: Memory Layout for an MSC Nastran simulation.

Figure 2: Memory Layout for an MSC Nastran simulation.

Figure 3 illustrates an example of a 2-million degree-of-freedom (DOF) system undergoing a static contact analysis (SOL 101) on three machines. These machines have identical hardware configurations except for the amount of installed RAM, which is 16GB, 64GB, and 128GB respectively.

Figure 3: Example of Memory Layout (bar chart) and Performance (yellow line) for Varying RAM

Figure 3: Example of Memory Layout (bar chart) and Performance (yellow line) for Varying RAM

For this model, increasing RAM from 16 GB to 64 GB resulted in a 53% reduction in elapsed time. Increasing RAM to 128 GB results in an overall reduction of 63% in elapsed time.  Notice the amount of memory allocated to the solver (HICORE) remains the same (shown in blue) while the size of BPOOL varies (shown in orange). Increasing the size of BPOOL results in a reduction in elapsed time, as critical data is cached in the BPOOL memory and I/O operations are greatly accelerated.

The general recommendation for memory allocation is to use memory=max on the MSC Nastran command line. This will assign 50% of the physical memory of the machine to the job and distribute this amount to the solver and the buffer pool depending on model size, characteristics, and available RAM.


3. Parallel Settings

Central processing units (CPUs) in just about every computer these days have more than one core on the physical CPU chip.  Each physical chip is called a socket. Depending on the architecture, the computer may be fitted with multiple CPU chips—each in its own socket—and each CPU may include multiple cores.  If software is written to exploit multiple cores, then how much faster the software can solve any given problem using these cores in parallel is called scalability.   

Scalability is crucial for solvers because it ensures the ability to handle increasingly larger and more complex problem sizes, accommodating growing demands without compromising performance or efficiency. In the context of MSC Nastran, there are two different, yet complementary ways to use parallel processing:

  • Shared Memory Parallel (SMP)
  • Distributed Memory Parallel (DMP)

The distinction between the two is best illustrated in Figure 4. Shared Memory Parallel (SMP) systems featuring multiple processors share a single, common memory space and I/O system, allowing for direct communication and data sharing without message passing. In contrast, Distributed Memory Parallel (DMP) systems allocate each processor its own memory and I/O system, necessitating explicit message passing for inter-processor communication. SMP architectures are ideal for multi-threaded applications on multi-core CPUs, while DMP architectures are common in large-scale parallel computing environments like clusters.

Figure 4: Difference between SMP and DMP

Figure 4: Difference between SMP and DMP

As an example, the best performance for an eigenvalue analysis problem is normally achieved using a combination of SMP and DMP. The general recommendation is to set DMP to the number of sockets (if there are several sockets in the machine) and to set SMP to the number of cores per socket. If you run on a system with two CPUs (sockets) with 10 cores on each, this means setting dmp=2 and smp=10. The performance of different combinations of SMP and DMP varies depending on the specific problem being solved and the model in use.


4. Hardware configurations

Disk
Modern operating systems do a good job of avoiding unnecessary physical I/O operations—that is, operations that entail writing to or reading from some permanent storage device, like a hard disk (HDD magnetic storage), a solid-state disk (SSD) or non-volatile memory (NVMe).  However, there is a limit to their ability to do this, controlled mainly by how much memory (RAM) installed in the system is available for buffered I/O (i.e. memory that is not taken up by currently running processes.)  Most software relies on the operating system I/O buffers to work efficiently, but some software, including MSC Nastran, implements its own buffered I/O logic, though there is a limit to this as well.  Given the typical size of a modern-day finite element model, problems that exceed the size of the I/O buffering system are commonplace.  The speed of the I/O system of the computer naturally has an impact on performance, and anything that can be done to enhance performance is a key part of reducing elapsed times.

Recommendations and guidelines

  • Avoid using a network, network-attached storage (NAS) or universal serial bus (USB) drive.
  • SSD or, preferably, NVMe storage devices offer higher I/O rates than classical HDDs and are strongly recommended for the filesystem used to store temporary data—this is called scratch space.
  • If you use HDD for scratch storage, the rotational speed of the disk (which ranges from 7000 to 15000 RPM) can have an impact; faster disks typically contribute to better performance.
  • Run MSC Nastran with the command line option SCR=YES if you do not need to save to a database.  With SCR=YES, some temporary data is deleted after use rather than moving to the database files.  
  • Using device striping for the scratch filesystem (RAID-0) is highly recommended.  Multiple media storage units (HDD, SSD or NVMe) may be configured so that I/O to the devices occurs simultaneously, essentially providing linear scaling of the I/O speed.  Both Windows and Linux can create RAID-0 striped file systems at the O/S level (no special controller is needed).  There is a limit to this scaling imposed by the conduit that communicates the device I/O to memory (called the front side bus or FSB).  Other levels of RAID (e.g., RAID-5) have purpose-built redundancy which is not needed for scratch data and may lead to a degradation in performance.
  • On Linux systems, use the XFS filesystem for the scratch filesystem; a journaled filesystem is not required for temporary data.
  • If you are on a budget, it is better (and cheaper) to have 4 HDD devices striped in a RAID-0 than a single NVMe.
  • Beware—all SSD/NVMe are not equal.  Inspect both the read AND write speeds of the device when choosing scratch storage. In write once, read many systems, read performance is more important than write, but with MSC Nastran both need to be fast to achieve optimimal performance. 

Figure 5 shows MSC Nastran performance for two jobs comparing SSD and HDD (SCSI) media:

Figure 5: Comparison of Hard Disk Drive (HDD) to Solid-State Drive (SSD) disk for two models.

Figure 5: Comparison of Hard Disk Drive (HDD) to Solid-State Drive (SSD) disk for two models.

CPUs

Like when trying to choose a good deal on a new mobile phone, with a bewildering array of options, a clear winner is rarely obvious.  It is impossible to give specific advice about which processor is the “best” option, but some guidelines can help you make a decision.  Often the number of CPU types and features will be restricted by other choices of hardware, such as the architecture of the motherboard which imposes a socket configuration for the CPU. 

By now it should be clear that there is no one specific item in a computer system to target when considering performance—it is an ensemble that matters most.  There is little point in having an extremely fast CPU, from which the solver phase will benefit if the I/O system is going to cause a bottleneck and increase the elapsed time.  Likewise, there is no point in having an extremely rapid I/O system if the CPU is slow, or the system has insufficient memory.  It is better to target a system that will work well in harmony. This generally means you don’t need the latest and greatest generation CPU and you can target a model that is perhaps one or two years old.  With new models and technologies coming out all the time, the cost of superseded CPU models drops off rapidly after their introduction, and the budgetary savings you make here can be used to invest in more memory or more devices for a striped I/O system, for example. 

When trying to identify a good bang for your CPU buck, seek out benchmark websites that compare CPU speed among various processors from all the vendors.  The benchmarks tend to be broad in scope, hitting different aspects of CPU technology, so if the processor is fast at doing, say, integer operations, but not so good at flops (floating point operations), or vice versa, these characteristics reveal themselves in the results.

With the current push for eco-friendly hardware, CPU chips often sport what is called an efficient mode (using E-cores) or performance mode (using P-cores).  While this might be great for a computer running a database application where the CPU is solicited infrequently, or for a mobile device where battery consumption may be a concern, for compute server systems plugged into the mains power, e-mode will hamper performance.  Ensure the CPU you choose provides the ability to disable any E-cores, or at least force the use of P-cores at all times.

Most CPUs today support hyperthreading, and while hyperthreading can be useful for some applications that tend to be interactive and of short duration, when considering MSC Nastran computations, experience shows that disabling hyperthreading results in a marginal increase in performance.  You will know if hyperthreading is enabled on a system because the number of logical CPUs will double the number of cores for overall CPU sockets; the hyperthreading makes it look like the CPU has more cores than it really has.

Each new generation of CPU brings with it new logic to speed up operations such as aggregating operations with clever access to registers in the CPU, or pipelining so that multiple operations are possible in a single CPU clock cycle.  In the past, we have seen the Streaming SIMD Extensions SSE, SSE2, SSE3, SSE4 and Advanced Vector Extensions AVX (Gesher New Instructions), AVX2 (Haswell New Instructions), AVX-512, AVX10 and most recently Advanced Performance Extensions APX.

The most important takeaway here is that the development of MSC Nastran strives to support the latest compilers and vendor libraries which in turn support the newest extensions, so, generally speaking, if features are available on your system they will be used. If those features are absent, things will still work.  This compatibility and the regular introduction of new capabilities are good reasons to adopt the latest version of the MSC Nastran software as it becomes available.


5. Solver description

MUMPS 

The MUMPS solver (Multifrontal Massively Parallel sparse direct Solver) is a high-performance numerical computing tool used to solve large systems of linear equations. It efficiently solves sparse, symmetric, and positive definite linear systems by employing a multifrontal method that leverages parallel computing techniques. MUMPS breaks down the problem into smaller subproblems, factorizes the sparse matrix, and solves for the unknowns concurrently to provide the solution. MUMPS is currently available in Linear Statics (SOL101) as well as normal modes computation (SOL 103) and buckling analyses  (SOL 105). 
To activate MUMPS, add the following to the Executive Control Section:

SPARSESOLVER DCMP(FACTMETH=MUMPS)

MSCLDL

The MSCLDL solver has its roots in the sparse solver technology developed for version 69 of MSC Nastran in the late 1990s.  Since then, the solver has been steadily improved to allow efficient solutions for dense problems, as well as asymmetric and complex matrix topologies.  The solver was originally developed to run with limited memory settings, so the focus had to be on out-of-core solutions and as a consequence, it has limited parallel scalability.  However, it remains an extremely robust solver and can be leveraged when insufficient memory is available to solve the problem posed in core. 

To activate, the SPARSESOLVER command needs to be added:

SPARSESOLVER DCMP(FACTMETH=MSCLDL)

Pardiso

The Pardiso solver (Parallel Direct Sparse Solver) is a high-performance numerical computing tool used for solving large systems of linear equations. Pardiso employs direct methods to solve sparse, symmetric, and positive definite linear systems efficiently. It utilizes parallel computing techniques to distribute computational tasks across multiple processors or cores, enabling faster solution times for complex engineering problems. The Pardiso solver consumes more memory than MSCLDL depending on the model type, but it can exhibit greater performance with SMP and DMP parallel settings. To activate the Pardiso solver, the SPARSESOLVER command needs to be added:

SPARSESOLVER DCMP(FACTMETH=PRDLDL)

CASI

The CASI solver is an iterative solver that is less dependent on memory compared with direct solvers.  Targeted at solid-dominant models, that is models where the number of solid elements is greater than 80% of the total number of elements and few load cases (<64), the CASI iterative method can provide a significant reduction in solution time.  As the mixture of element types disrupts the solid dominance or many load cases are to be solved, the iterative method becomes less attractive and one of the direct solvers will typically provide better performance.  The CASI solver is activated by adding the following in the Case Control section:

SMETHOD=element

Figure 6 demonstrates a set of guidelines for SMP usage. First, a general definition of what is considered a small, medium, large and very large job:

 Problem Size  Model Types  #dofs      
 Small  Solid-Element Models
All Other Model Types
 < 2 M
 Medium  Solid-Element Models
All Other Model Types
 2M< #DOF<10M
 
 Large  Solid-Element Models
All Other Model Types
 10M< #DOF<20M
 Very Large  Solid-Element Models
All Other Model Types
 20M <#DOF

 

 Method/Size  Small  Medium  Large  Very Large
 MSCLDL  SMP=1-2  SMP=3-8  SMP=8-16  SMP=8-16
 PRDLDL  SMP=1-8  SMP=8-16  SMP=16-32  SMP=16-64
 MUMPS  SMP=1-8  SMP=8-16  SMP=16-32  SMP=16-64
 CASI  SMP=1  SMP=1-2  SMP=3-4  SMP=4-8

Figure 6: Recommended SMP values for different problem sizes and method types

Lanczos

The Lanczos method, named after the Hungarian mathematician Cornelius Lanczos, is an iterative numerical algorithm used to approximate the eigenvalues and eigenvectors of large, sparse, symmetric matrices. It uses Sturm sequence logic, named after the French mathematician Jacques Charles François Sturm, to ensure that all modes are found in the requested range. When required to extract a relatively small number of eigenvalues, the bulk of the computation time for Lanczos is spent during symmetric factorization.  This factorization embodies virtually the same compute considerations as the solution of the linear static problem, so either of the two sparse direct solvers MSCLDL or MUMPS (including MUMPSBLR) may be used.

Lanczos is generally recommended when few roots (eigenvalues) are sought or when the problem is considered small. For larger problems, or problems involving many modes (or both), the Automated Component Mode Synthesis (ACMS) method is recommended to solve the eigenvalue problem.  The ACMS solution strategy involves using the Lanczos method to solve a series of smaller problems.

The Lanczos method of eigenvalue solution begins with defining an EIGRL entry in the MSC Nastran input file, where V1 and V2 may define the frequency interval in which to extract modes. Alternatively, the number of modes to extract may be defined in the ND field of the same entry.

ACMS

ACMS is an acronym for Automated Component Mode Synthesis, and it is the method of choice for solving large eigenvalue problems. The eigenvalues produced by ACMS are computed using reduced stiffness and mass matrices, and as with any component mode synthesis method, it is subject to an approximation due to mode truncation.  To a great extent, the approximation may be mitigated using residual vectors, and the modal basis produced by the method may be used in the same way as a classical modal basis for downstream operations.  The pay-off with ACMS is a much faster way to yield the final modal basis and one that scales very well with the problem dimension.  As the problem size increases, elapsed times compared with the direct Lanczos method to solve the entire problem show better and better ratios and examples of the method being 100 times faster have been demonstrated, as shown in Figure 7. To activate ACMS use the executive control command:

DOMAINSOLVER ACMS


Figure 7: Performance Comparison of Lanczos and ACMS

Figure 7: Performance Comparison of Lanczos and ACMS (for more details on the model please refer to the HPC User’s Guide – Chapter 6)

Parameters affecting ACMS

Several parameters can affect the accuracy and performance of ACMS. The most important ones are presented below.

Eigenvalue frequency range for modal extraction

For structural analysis, the tendency is to request the lowest modes of the system.  This is generally because the effective mass of the structure makes it difficult to solicit significant responses in the higher frequencies, so the lower frequency behaviour dominates the total response.  However, dynamic amplification due to the presence of damping means responses in the higher frequencies still have a cumulative effect, so in general eigenvalues need to be computed to a higher frequency than the highest excitation frequency.  The form of the dynamic amplification curve clearly shows that at or around a ratio of excitation frequency to normal mode frequency of 3, the response of the system tends to return to a static response.  This upper bound on the frequency for computation of the eigenvalues can be defined in the V2 field of the EIGRL entry in the input file.

In an attempt to reduce elapsed solution time, you may be inclined to compromise accuracy by defining a V2 value lower than the ratio of 3.  However, the solution methods in MSC Nastran in general, and ACMS in particular, are so efficient that the cost of the additional number of eigenvalues required for reliable accuracy is not inordinately expensive in terms of time. 

The only time it may be advisable to compromise this guideline is when many thousands of eigenvalues are required for high frequency (for example, 50,000 eigenvalues or more.)

  • PARAM, RMRBE3RT,1
    This parameter should be activated to remove rotational DOFs from the REFC field of any RBE3 whose reference grid point (REFG) is used by solid elements only and REFC has rotational DOFs. 
  • UPFACT
    ACMS uses an automated method of component mode synthesis (CMS).  As the CMS moniker suggests, the method uses a modal analysis of a component part of the problem where the modes are computed with boundary conditions adapted to the method.  These boundary conditions are typically fixed and do not represent any physical connection, and as a result, an approximation is introduced.  The approximation can be mitigated by using a larger frequency range in which to compute the modes for the synthesis, and this frequency range should be larger than the frequency range in which the modes are computed for the final modal analysis; how many times larger this frequency range (defined via the UPFACT parameter) should be is somewhat model-dependent, but experience has shown that computation of modes in a frequency range twice that of the frequency range for the final modes is a minimum.  Studies have also shown that increasing this frequency range improves accuracy for little increase in the cost of eigenvalue solution of the component modes and that above an UPFACT value of 4 or 5, there are diminishing returns.  Consequently, you may increase the frequency range for the automated component mode synthesis by using the UPFACT parameter, and this is set on the DOMAINSOLVER executive control entry. The higher the UPFACT parameter, the higher the accuracy of ACMS, but the computation will take a little longer.

Some other parameters useful for the ACMS solver are briefly mentioned here for the sake of thoroughness. ACMS has two distinct mechanisms to fix massless mechanism problems automatically (MECHFIX and PARAM, MMETH BETA). Furthermore, ACMS supports the SPCD method as opposed to the obsolete large mass approach that poses several limitations. Residual vectors are also supported to account for modal truncation effects. Finally, a new feature to analyze singularities has been developed. When a run fails due to numerical singularities, a file called singularities.bdf is automatically output for additional visual inspection by the user. The option to convert automatically shell elements to membrane elements is also supported with the SLDSKIN command to avoid singularities caused by the low bending stiffness of thin shell elements. 

FastFR

The linear frequency response problem is one in which both the applied excitation and computed response are assumed to be harmonic.  The equation of motion for frequency response is formulated in such a way that the addition of the mass, damping and stiffness with appropriate scaling by powers of the excitation frequency, and in the case of damping the imaginary operator, form a single matrix called the dynamic stiffness.  One or more complex excitation loads are applied to the factorized dynamic stiffness at each excitation frequency. 

The solution procedure resembles a series of static analyses, except now the static stiffness is replaced by the dynamic stiffness and the problem is computed in complex arithmetic at each excitation frequency.  Given specific circumstances, the frequency response problem may be solved differently than the traditional method involving factorizing the dynamic stiffness.  If regional-dependent structural damping is defined via the GE field of several bulk data entries in the input file, the method may introduce an approximation associated with the ability to represent the damping sufficiently close to the physical behaviour. 

In MSC Nastran, the FASTFR method can be a significantly faster method for solving the modal frequency response problem and the method is selected automatically when specific circumstances occur.  For example, if the number of modes is small, it is unlikely the FASTFR option will provide much benefit over the default direct factorization solution, so a minimum problem size defines a threshold; this may be moderated using the PARAM,FFRHMAX entry.  In addition, concerning the regional structural damping matrix K4, two aspects are assessed to determine if the FastFR method is tenable.  First, if the rank of the K4 matrix is large, the equivalent problem that FastFR solves will need to be large, and there comes a point where there is little benefit in solving the equivalent problem.  Second, the K4 damping matrix must be decomposed into a specific form so that the FastFR method may be used.  If there are significantly different values of structural damping factor in many different regions across the model, then it can be challenging to derive the form necessary for the FastFR method.  Currently, an assessment is made of the coupling in the K4 matrix, and if it is too high, FastFR will be deactivated.  Many cases result in low degrees of coupling, and the FastFR method can yield significant gains in elapsed time for these situations. 

From the user perspective, you may decide to assess the coupling in the structural damping matrix to override the automatic logic and force the use of the FastFR method.  If global structural damping is defined via PARAM,G and/or one or more discrete regions of GE damping are defined via material and element entries, the resulting coupling in K4 may be low enough for FastFR to provide some gain; the FastFR method is also available for symmetric and asymmetric solutions, and can be forced by defining PARAM,FASTFR,YES in the input file.

In summary, the automatic selection of whether to use the FastFR method is the default as set by the parameter PARAM,FASTFR,AUTO.  To force the FastFR method, define PARAM,FASTFR,YES and to deactivate the method define PARAM,FASTFR,NO.  When PARAM,FASTFR,AUTO is set, the problem size threshold is controlled via PARAM,FFRHMAX (default=2500), i.e. the number of modes in the problem must meet or exceed this value for FastFR to engage.


7. Solve=auto

As we have seen, there are many aspects to choosing an effective method to solve any given problem, from selecting the solver to memory settings and parallel specifics; this can be challenging.  Help is at hand.  Specifying SOLVE=AUTO in the run configuration file or on the command line is a simple way to select the optimal settings for a job automatically. The settings are derived by implementing machine learning for memory prediction for the Pardiso solver as well as a wealth of heuristics from MSC performance tests. The SOLVE=AUTO process takes the following steps to decide the optimal settings:

  • Determine Model characteristics (SOL, dofs, dominant, element type).
  • Determine the best solver (MUMPS, Pardiso, CASI, ACMS, etc). CASI=NO disables the CASI solver, ACMS=YES enables the ACMS solver and ACMS=NO disables the
  • ACMS solver as options.
  • Determine memory requirements (for a static solver).
  • Examine hardware availability (memory / CPU).
  • Select Solver.
  • Select memory and BPOOL.
  • Select how to partition cores (SMP / DMP).
  • Provide optimal setting information print without running the job with NORUN=YES as an option.

The logic used to determine good settings is defined in a decision tree, which you can find in Chapter 8 of the HPC User’s Guide. See “Automatic Solver selection” for more details.

solve=auto norun=yes

If you are running MSC Nastran in a cluster environment, it is sometimes useful to assess the resources needed to run a particular analysis, but often this means running the full job. Using the combination solve=auto norun=yes on the command line will request MSC Nastran to run through the startup procedure for the job, but will stop the job before the analysis phase of the problem.  The output file generated from this procedure provides the command line required for a specific simulation in a ready-to-run format.  In this way, the solve=auto logic can be used to determine ideal settings for the job, which can then be used to run a second job (where norun=no i.e. the default).

By estimating the best settings, solve=auto enables users to submit jobs on larger computer configurations, preempting potential crashes caused by insufficient memory or inappropriate solver selection. This option enables you to determine the optimal settings without executing the full analysis, allowing for customized preferences during submission.

For more information please visit the MSC Nastran HPC User’s Guide or our website.