Storage as just one COG in the Cloud – Part 2
Picking up from where we left with part one there has been a bit of change in the EDA for Cloud circle. And hopefully we continue this drive, and DAC-2019 brings us some more really good presentations. I might cross-post this to my home blog site at RPMSystems.IO …that site should have other interesting posts as time goes on.
During the chip-design lifecycle, with the ebb and flow of compute needs, along with the ever-increasing project size, and ongoing die-shrink, it is a key factor to employ co-design of the network, compute and storage at the same time and in the optimal proportions. With a COG approach, you can get away from the painful “reactive buys” of the individual subsystems to address imbalances. To diverge a bit from EDA, I will give an example of the Media and Entertainment business. When a large company such as Disney, Pixar, Lucasfilm, or Weta Digital starts a new blockbuster movie, the “Executive producers” first invest Millions of dollars in the compute and storage to do a particular movie. At the completion of this movie or series, most of the equipment is retired because it is now out of date, and the next movie will have it’s own budget for modern equipment. (This trend is changing too) Just like EDA, Time is of Essence, and they need the quick production and turnaround, thus they need to be using the fastest and most modern equipment. This is a bit over generalized but is easier to have this quick turnover of hardware because the scale is much smaller than large EDA companies which have many overlapping projects. Wouldn’t it be nice if an EDA company could quickly and effortlessly scale up for the Tape-out phase of a large project, and minimize the planning and spend and op-ex resources to update a segment of the Compute grid to handle the peak for only a month, then spin down that COG and save on the chargebacks the next month. Sure, there will be another month-long peak coming in a couple months, but it should be obvious that this ebb-and-flow needs to be minimized when at this scale.
This is where the Cloud Oriented Grid makes a logical best fit. If this has been so obvious to executives and financial analysts and IT personnel for years, why has it not succeeded in the early public clouds like Amazon? The reasons are virtualization and Object-storage which are the basis of the first generation public cloud. EDA apps don’t know how to deal with object-storage, (they only want a POSIX filesystem) and there is no need for virtualization since LSF is the scheduler and it keeps all 20 cores busy on all machines all of the time. Most all public cloud offerings before 2017 were using hypervisors and Virtualization. (Similar to VMware ESX, or QEMU, KVM, Docker, etc. ) Again: EDA applications do not need virtualization, they need bare metal and RedHat/CentOS, or SUSE Linux running on thousands of compute nodes. It’s ironic that the main design features of the Cloud (virtualization, and object storage), for EDA it is the biggest impedance.
In 2007, I attended the DAC conference, the highlight of the conference was a 20 person panel of experts from the ISVs, integrators, and of course, Amazon. Cloud was all the BUZZ. The ballroom was standing room only to hear the great proclamations that the cloud is here to revamp EDA just as it had done for the WEB2.0 and other dot-commerce that seemed to just drop right in and scale out. The following year at DAC it was a quite a bit smaller (hauntingly gone), and the big partnership of Synopsys and AWS was not advertised like the year before. The reasons why were previously mentioned, and they are still valid to this day, even though many EDA companies have had long-going internal projects and engagements with cloud companies to “port the workflow to the cloud”. I sat next to one of those very talented developers/programmers for the last 5 years. I felt his pain over the cubicle wall. He had success of course, but it was for only a particular tool/flow and the engineering effort and workflow change was very significant. For the amount of effort, I never could envision porting all the various workflows. It was like trying to fit a square peg into a round hole.
10 years after that first DAC Cloud revelation, but this time with the help from the HPCPros.org, and Derek’s “Infrastructure Alley”, finally the Cloud Companies are returning to the vendor floor at DAC. I asked my friends (one of which is now working with AWS as their EDA architect, and other friends at IBM and Google) if this was really the year which the cloud was going to be compatible with EDA. They all responded that they felt so, because they had some fundamental changes and offerings (bare metal and high performance or co-located storage) which would be the new foundation. They said that at this point, all that is missing is a 3rd party expert group (like me) who had the ambition and the real-world work experience to architect, develop, and integrate the pieces, making it seamless for the EDA customers, and especially engineers to work in. Luckily for me, I had this same vision a few weeks before, and had printed up business cards for my new company: RPM Systems.
In Infrastructure Alley, there are the big cloud companies such as AWS, Google, IBM, and Alibaba Cloud. There are storage companies there such as Pure Storage and EMC, and IBM. There are Scheduler companies such as Univa and IBM. It feels good to see familiar and welcoming faces, now that I’m on the other side as an Independent consulting company specializing in EDA/HPC/Storage. It is obvious from the presentations on stage that the industry is embarking on a re-boot of the cloud challenge, and there were still many rough edges which are not yet solved.
A week after DAC, I was approached by some friends at Oracle, who asked me my opinions on the new upcoming Oracle Cloud offering specifically aimed at HPC. After hearing more about it over lunch, I shared my excitement and exclaimed that they have covered 3 of the main impedances to previous cloud attempts: bare-metal servers, minimal over-subscribed network, and high-performance NFS storage. As previously mentioned I have extensive experience with Oracle ZFS, and even though someday the native Oracle Cloud NFS offering (FFS) may someday perform up to the challenge, I convinced them that the Oracle ZFS Appliance is a known-quantity and commonly used in EDA, and if it could be co-located on the same network as the compute nodes, then the engineers and IT support would feel right at home with respect to stability and performance. To properly support the tremendous NFS IO, the network would need to be top-notch. They have that covered with a non-blocking network with high speed and some upcoming roadmap/announcements which will put the icing on the cake. The next hurdle would be to offer a modern and easy to provision (with PXE boot) bare metal offering where the customer could install their standard image and dynamically expand or shrink the cluster (in conjunction with LSF re-configs of course). After that, data replication is next in order, and Oracle again seems to have the best model, without any data-egress fees from the cloud storage. The customer pays for familiar costs and processes with traditional ISP carriers. I’ve begun work on RPMSync™ which is a ground-up development which leverages my many years of filesystem replication experience. This will implement modern management and transport features, and some very beneficial data-reduction strategies which will help storage and network costs and also improve performance. File synchronization is one of the key challenges which all EDA companies have found when going to the cloud. There are quite a number of solutions out there, but RPMSync™ will specialize in EDA needs and be one more option for the customer, with a twist: It can be third-party managed by RPM Systems, thus reducing the load on internal IT support.
The bare-metal HPC offering now available at Cloud providers such as AWS, IBM, Microsoft, and Oracle overcomes the previous integration issues and overhead of hypervisor virtualized systems. As previously mentioned, compute grids use schedulers such as LSF and therefore the virtualization of first-generation Cloud added even more management complexity. Most cloud users would say that Kubernetes on top of Docker containers is not much burden, but EDA and HPC users are comfortable with LSF and SGE and already have enough to worry about with those. Additional workflow steps are needed to setup, and later tear-down a container per compute job. This comes at an expense in performance, especially when many jobs last less than 5 minutes. Even more expense is spent when something goes wrong, and you need to trace logs and job reports from different schedulers. Once again, we need to do what we can to Keep It Stupid Simple.
I’ve spoken with EDA companies who actually like the docker container cloud model for EDA. Some have even got it to work well! Some even see benefit over traditional bare-metal extension into the cloud. Good for them! They can do that as well. But as you may have guessed from my blog, I think that there is an excellent opportunity for classic EDA in the cloud with newer filesystems such as VAST or classic filesystems such as ZFS and GPFS in a colo.
On bare metal, there is only one OS Kernel running, and an advantage of this is that debugging engagements, and performance monitoring applications can be much more accurate and reliable. End-users and CAD support people want to deal with the tools and workflows, not the IT infrastructure.
When running on bare-metal you can also record important metrics such as CPU power consumption, and temperatures. Some very enlightening metrics such as iowait, Off-CPU time, cache hits, and memory bandwidth are only able to be accurately measured on bare metal. You also get the true network-interface statistics, which is a huge advantage over the virtualized NICs. Those fine-grained metrics are the absolute key to unlocking full system potential and optimizing your EDA workload and balancing your Storage IO Bandwidth load to result in the most efficient use of Cloud CPU time, and EDA License time. Deterministic performance and high efficiency results in direct cost savings.
RPM Systems is an EDA/HPC specialized consulting group which partners with vendors to provide both on-prem system architecture, and public cloud architecture solutions which are modular and consistent. There are no new problems. Only new solutions to old problems. EDA has been trying for almost 10 years to adapt to the Cloud, and now the Cloud is adapting to EDA. RPM Systems has the breadth and depth of knowledge of the internal systems of a world class EDA Private Cloud. With these next-generation Public Cloud infrastructure, we now have the ambition to forge a new Cloud Oriented Grid using a simple and modular approach which can be replicated on-prem, or in Public Cloud.
RPM Systems also to also has advanced experience and products to create a feedback-loop into your scheduler to limit the storage IO load, and thus keep the filer and network out of saturation, supporting the goal of maximum runtime efficiency. The RPMSAS™ Storage Aware Scheduler is a rule-based machine learning autonomous system which samples performance from the filer, the client, and from LSF, to automatically lower the load and keep job efficiency at maximum. At the moment, it integrates seamlessly with Oracle ZFS Storage Appliance, and IBM Spectrum Scale, and soon the remarkable new VAST universal Storage, but it can be integrated with other storage.
High Performance Storage, File-level replication, minimal workflow change, and deterministic performance are the four missing links preventing EDA from getting to the next level. RPM Systems specializes in working with EDA companies and Cloud Providers to deliver an end-to-end solution and not just a framework. We are there to do that integration and co-design.