NEC User Group Confidential (Do not distribute)

# **NEC's contribution to future HPC market**

June 12th, 2024 MASAKI KONDO, Senior Director

> HPC Department, Infrastructure Technology Service Division, Digital Platform Business Unit
> / Education and Science Solutions Department, Government and Public Solutions Division, Public Solutions Business Unit
> NEC Corporation

# Agenda

#### 1.NEC's HPC business policy and activities

Recent installations CPU/GPU roadmaps Scope for HPC and Data Utilization/ Data production

### 2. Characteristics of vector technology and future plan

3. Summary

In the Academic, Enterprise and Administration business domain, We provide better HPC platforms, SI and services to solve the social and business issues that Scientific technology organizations and R &D organizations use for large-scale scientific calculations effectively and efficiently.



NEC User Group CONFIDENTIAL

# Recent installations in Japan

#### **Tohoku University "AOBA-S"**



- The World's fastest vector computer (21.05 PFLOPS)
- 504nodes of SX-Aurora TSUBASA new models
- Computational performance 14x greater than existing system
- Top500 61th place, HPCG 13th place (June 2024)

#### University of Tsukuba "Pegasus"



- Adopt H100 GPU and Sapphire Rapids, First in Japan (Dec. 2022)
- Green500 20th place (June 2024), 41.12 GFLOPS/W, Japan No.1
- 120nodes of Intel Xeon Platinum 8468 (SPR) + NVIDIA H100 GPU (6.5 PFLOS)

#### Osaka University "handai-mdx"



- Virtualization infrastructure used for various data utilization purposes
- Red Hat OpenStack and VMware
- Coordinates with MDX system operating in University of Tokyo
- 60nodes of Intel Xeon Platinum 8480+ (SPR) x2

#### JAMSTEC "Earth Analyzer"



- Virtualization infrastructure used for data analysis, publish, and disseminate
- VMware vSphere and NSX (Firewall)
- 30nodes of Intel Xeon Gold 6430 (SPR) x2
- Nodes and storage are connected with 100Gbps Ethernet (RoCE)

© NEC Corporation 2024

# Recent installations in EMEA

#### DWD



- Phase 2b(adopt Aurora3) completed and started operation in 2nd half of 2023
- Phase 3 contract ends 2023 installation Q4 2024
- Production of Milan Cluster Q2 2023
- Additional system will be installed and start operation soon

#### Universität Hamburg Hummel-2



- 178 CPU nodes with 2x AMD EPYC 9654 CPUs (Genoa)
- 6 Huge Memory Nodes with 6TB RAM
- 4 GPU nodes with 8x NVIDIA H100 SXM
- BeeGFS Storage 5PB usable
- Cornelis OPA Network
- Direct water cooling

Universität Tübingen

#### **RWTH Aachen CLAIX-2023**



© NEC Corporation 2024

- National HPC (NHR) System for Computational Engineering Science applications
- 632 nodes with 2x Intel Xeon Platinum 8468 CPUs(Sapphire Rapids)
- 52 AI nodes with 4x NVIDIA H100 in a dedicated Machine-Learning partition
- Direct water cooling



#### **Expansion Level 1**

- 5x GPU-Nodes with 8x NVIDIA H100
- 4x Hypervisor-Systems

#### **Expansion Level 2**

- 10x GPU Nodes with 8x NVIDIA H100
- 2x CPU Nodes with 2x AMD EPYC 9654
- 2x Hypervisor-Systems

# **Research information infrastructure enhanced from HPC**

NEC will provide total solution of infrastructure such that generate, simulate, aggregate, utilize and manage Research Data.



# Case Studies of Research Information Infrastructure

Osaka University's data aggregation platform "ONION" combines with HPC "SQUID", data processing system "handai-mdx", public cloud, and several devices via S3 protocol.



 JAMSTEC has Earth Analyzer and Tape storage system in addition to HPC "Earth Simulator".
Simulation data will be utilized using several applications, and efficiently archived with HSM.



NEC User Group CONFIDENTIAL

# Filesystem/storage

As research information infrastructure becomes increasingly important, file systems and storage will become even more critical.





- Matured filesystem with reliability and rich functionality scope
- Formerly known as GPFS





NEC SSE500 Storage Array

NEC HPC2112RK-1 Storage Server



- Good support (in Japan)
- High-density, high performance
- Hybrid and full-flash storage solutions.





- For customers with IO intensive workloads
- Used in a wide range of HPC environments.





NEC SNA 060i/120i High-Density JBOD

NEC SSE400 Storage Array



- Specialized for AI-Workloads
- Optimized for NVMe.
- Tight S3 integration



NEC HPC2212RK-1 Storage Server



- Good price / performance
- Good for I/O intensive workloads



NEC SNA 060i/120i High-Density JBOD



- Cloud tiering
- Remote distribution



#### **NEC User Group CONFIDENTIAL**

# Characteristics of Vector technology and future plan





### NEC long vector's competence (vs CPU)

> Higher effective performance, Higher performance/power ratio

- Higher performance/power ratio under the memory intensive application
- > Calculation accuracy of simulation results, therefore repeatable

## NEC long vector's competence (vs CPU)

> Higher effective performance, Higher performance/power ratio

When processing a large number of elements(vector, SIMD) simultaneously, long vector advantage delivers high performance

- Higher performance/power ratio under the memory intensive application
- Calculation accuracy of simulation results, therefore repeatable

### NEC long vector's competence (vs CPU)

> Higher effective performance, Higher performance/power ratio

- Higher performance/power ratio under the memory intensive application
- Calculation accuracy of simulation results, therefore repeatable

## Architecture policy : Peak performance or Sustained performance with well balanced memory bandwidth

Footprint ratio comparison of Arithmetic Unit (AU) / Register / Memory Access on LSI



#### *Vector architecture is designed to better suite memory intensive applications*

## Calculation accuracy of simulation results

#### Vector is faster and more energy efficient



# Performance and power consumption comparison using ICON (DWD NWP application)

VE30A : 7nm process rule A100-80GB : 7nm process rule

Power consumption does not include CPU part

#### Panagiotis Adamidis, et al. The Real Challenges for Climate and Weather Modeling on its Way to Sustained Exascale Performance : A Case Study using ICON (v2.6.6)

https://gmd.copernicus.org/preprints/gmd-2024-54/

### **VE30** is **Faster** and **more energy efficient** than A100

### NEC long vector's competence (vs CPU)

> Higher effective performance, Higher performance/power ratio

- Higher performance/power ratio under the memory intensive application
- Calculation accuracy of simulation results, therefore repeatable

## Calculation accuracy of simulation results

#### **Errors of Division / SQRT**

| Accelerator | Method                                                   | Error in Division | Error in SQRT |
|-------------|----------------------------------------------------------|-------------------|---------------|
| NEC Vector  | Dedicated Arithmetic Unit                                | No Error          | No Error      |
|             | <b>Reciprocal Approximation</b><br>(compiler code based) | No Error          | With Error    |
| GPGPU       | Reciprocal Approximation<br>(Function call)              | With Error        | With Error    |

**No Error : IEEE754 compliant** 

#### Non-repeatability of results by Memory processing / Synchronization

| Architecture | Memory processing                              | Synchronous processing                         | Result                                    |  |
|--------------|------------------------------------------------|------------------------------------------------|-------------------------------------------|--|
| NEC Vector   | The order of memory write assured              | Synchronizes computation among processor cores | Unique                                    |  |
| GPGPU        | The order of memory<br>write in SM not assured |                                                | Not unique<br>(Sum orders are not stable) |  |
|              |                                                |                                                |                                           |  |

SM : Streaming Multi Processor

## Vector Architecture Roadmap



Subject to change

Openchip Software Technologies (OCT) have plan to productize vector chip using BSC IP.
NEC will collaborate with their development with our IP and/or vector experiences.

- Existing vector code would be able to run on this vector chip with minimum effort (porting and tuning)
- > NEC considers to provide server with next vector chip developed by OCT

© NEC Corporation 2024

# **Discussion of collaboration for Vector Accelerator**

### 2023



**Barcelona Supercomputing Center** Centro Nacional de Supercomputación



BSC and NEC are working together since last year.

We were discussing how to collaborate for next vector chip.

# **Productization Scheme For Vector Accelerator**

### 2024



# **Business Scheme For Vector Accelerator**



# Video message from Prof. Mateo Valero (BSC)



**BSC** and NEC are working together since last June

NEC has the best team in the world for vector architecture

BSC created Openchip to deal with high performance vector architecture chip

No doubt that we make very good vector chip

NEC User Group CONFIDENTIAL



**NEC** will work on latest CPU and GPU solutions with collaboration between CPU/GPU vendors and NEC

NEC will provide total solution of Research information infrastructure such that generate, simulate, aggregate, utilize and manage Research Data.

NEC will continue delivering vector solution for accuracy, higher performance and higher energy efficiency under the collaboration between BSC/OCT and NEC

# Vector technologies will be continued



Your warm support for new vector accelerator will be appreciated

© NEC Corporation 2024

**NEC User Group CONFIDENTIAL** 

# **Orchestrating** a brighter world

NEC creates the social values of safety, security, fairness and efficiency to promote a more sustainable world where everyone has the chance to reach their full potential.