Hybrid Cloud Infrastructure to Manage AI Lifecycle

In this Tech Barometer podcast interview with Induprakas Keri, senior vice president and general manager for hybrid multicloud at Nutanix, learn why hybrid cloud IT systems help enterprises evolve their use of AI applications and data.
Find more enterprise cloud news, features stories and profiles at The Forecast.
Transcript:
Induprakas Keri: The only place on the planet where you have enough computing power to create a foundational model is a public cloud.
Jason Lopez: This is the Tech Barometer podcast. I’m Jason Lopez. Induprakas Keri, general manager of hybrid multicloud at Nutanix, says within the AI lifecycle, there’s a minefield for it. AI deployment spans across different infrastructures for each of its three parts, training, augmentation and inferencing. The gist of this story is how these parts challenge the deployment of ai. So starting with part one, the training, this is where the foundational model is created. Keri says, public clouds are the place to do this because of the massive compute power needed, like a lot of GPUs, which you find on AWS Azure or the Google Cloud. In our interview, Keri described the first phase, the training as akin to the gargantuan effort of creating a new language. We move on to the second part of the process where the model undergoes augmentation.
Induprakas Keri: If you train a public foundational model with your proprietary data in the public cloud, you have basically given up ownership and control of the data. It’s now baked into that model and you can’t do much about it. And so what a lot of organizations are going to do is they’re going to take that foundational model, they’re going to augment it or do this thing called retrieval augmented generation and have that happen on-prem so that the data that they control or they own that is really specific to their own success is not in the public cloud. It remains proprietary and it remains under their control, so that creates this new model that, for example, you might have a generic foundational model that does support responses, but you might train it with your own data that makes it much more specific to your business or your sort of products.
[Related: Building a Solid Enterprise AI Infrastructure Strategy]
Jason Lopez: The third part is inferencing. Here, Kere circles back to the language analogy.
Induprakas Keri: Somebody invents the French language, but then the French language is spoken by 50 million people or a hundred million people, and so some group of people spent a lot of effort creating the French language, but it was maybe 20 people, 30 people, 50 people, a hundred people, and they put a lot of effort, but then the usage is in millions day after day after day after day. The law of large numbers I think applies here. Inferencing, which is using the model to make decisions, is something that you do millions of times. If you then save the energy that’s used in the usage and the inference in by let’s say 30%, you then save that energy, not once, but a million times or millions of times.
Jason Lopez: He says most organizations which want to establish AI can’t be choosers, meaning they can’t be rigid about preferring one type of cloud over another.
Induprakas Keri: You can’t say that I’m only going to do things on the public cloud or only on the private cloud because typically most organizations, unless you’re a Fortune 5 company, you’re not going to have the compute resources to go train foundational models and you need to go depend on the public cloud. But by the same token, you probably don’t want to train the foundational model with your specific data in the public cloud, and you certainly want inference close to the edge, which may and may not be in the public cloud. Sometimes public clouds provide inference and endpoints, but most often they’re at the edge where the decision needs to be made because of latency reasons or other reasons.
[Related: Building a GenAI App to Improve Customer Support]
Jason Lopez: Keri emphasizes that open AI’s foundational models while taking the world by storm exposed challenges to it and deploying these models in production. Companies need to make choices to avoid just throwing their data out there. You have to have a plan. In his job managing Nutanix’s hybrid cloud, Keri says there’s a big advantage to having the options hybrid brings like running AI over public cloud or on the company’s own infrastructure. This makes hybrid ideal for AI. A hybrid approach allows businesses to refine models in the public cloud and then move them to a private cloud and maintain control over sensitive data without exposing it unnecessarily.
Induprakas Keri: As users have become more aware of model capability, I think this whole notion of public data versus proprietary data has also started to become more of a consideration. Once they realize that there’s a clear boundary between private data and public data, and once they realize that they need to make sure that any foundational model they need to find a way to take a snapshot of that, create something that’s more in-house, the blast radius of training that with data is limited, and then I might want to go optimize that model so that I can deploy it for inferencing.
[Related: Study Shows Big Uptake of Enterprise AI and Cloud Native Technologies]
Jason Lopez: In talking about AI, Keri boils it down to math and describes it as performing lots of matrix vector multiplications in sequence that gradually transforms raw input data into something meaningful. For an analogy, he draws upon Isaac Asimov’s Foundation trilogy. In the story, there’s a fictional science that blends history, sociology and mathematical statistics to predict the future behavior of large populations of people.
Induprakas Keri: AI seems a little bit like that. I think it’s exceptional for predicting average behavior or generally a particularly prevalent pattern of behavior, but I think that’s also its limitation. The way that these converge, the way that a lot of this linear algebra operations converge is different from how humans reason.
[Related: Bridging the Gap Between AI’s Promise and Fulfillment]
Jason Lopez: He points out. Human reason is based on a kind of self-awareness. Humans have a view of the world, a top down view.
Induprakas Keri: Whereas recent AI, what it’s really done is take data and get structure from that, but without something that provides a top down view of the world. Newton didn’t come up with the laws of gravity by running regressions. It’s not like he took a billion apples and then sort of threw them from a billion trees or watched a billion apples fall did regression and said, oh, wait, there is this thing. He came up with a model and then he ran experiments to validate the model. Generative AI is definitely missing that, and I think unless you have a model of the real world, you’re always done to fall short. I think data only can give you so much insight.
Jason Lopez: Successful AI adoption is not a sure thing, particularly during the experimentation phase, but organizations are under pressure to invest in AI infrastructure quickly to avoid being left behind, even though it’s unclear which use cases will ultimately be successful. Fraud detection and AI copilots are too familiar use cases.
Induprakas Keri: Support is another one. I think if you go talk to a lot of online agents right now, they will interact with you in a very human light way, even though they’re really powered by an AI chat bot, but beyond those three, there’s a lot of fomo right now. I think organizations want to believe that AI is going to transform their business and they don’t want to be settling in that race.
[Related: Future of AI: 9 Predictions for IT Innovation]
Jason Lopez: He says the belief that a substantial investment in AI will pay off doesn’t yet match reality across various use cases, especially in early stages of adoption.
Induprakas Keri: I think what you really want to do in this phase of experimentation is you want to shorten the time to outcomes. You want to make sure that with the least effort possible with the least amount of hardware, infrastructure, investments or training or what have you with the least amount of effort possible, I can gain that insight that I can then scale, and I think that’s what we are trying to do for our customers. You don’t need to spend six months setting up your infrastructure. You don’t need to spend three months setting up your Kubernetes environment. You don’t need to worry about what happens when you take this model that you have trained inside the data center and take it to the edge and all of a sudden find 75 issues because it was a different hardware platform. Every time you interact with chat GPT, for example, you are with your insights. My son was looking for a booth summary the other day. Chad GPD produced something that was very stylish but completely wrong, and then he spent like seven minutes fits in it, and I was telling him, you’re making chat g PT better, but you’re not getting any value from that.
Jason Lopez: It’s a lesson for businesses.
Induprakas Keri: All you have to do is look at how much money OpenAI spends in terms of running its model. I mean, that’s not sustainable if you are even a hundred million dollars business,
Jason Lopez: And he says the massive data sets needed to train AI present this issue. When one person uses data, it doesn’t stop others from using it too.
Induprakas Keri: Data is non-rival risk. It’s an economic term. If I give you a phone for example, I might be able to kill that phone remotely, so that phone all of a sudden becomes useless, but if I give you a piece of data and if you write it down or if you make a copy of it, there is no secret button that I have that allows me to do an auto destruct on the data that you have.
Jason Lopez: Keri is confident that the ability to isolate parts of a model, not meant to be broadly shared, get better.
Induprakas Keri: It’s going to be hard to filter out public data from private data once the model has been produced. Unless we solve that problem, I think enterprises are going to be cautious in terms of how much private data they’re going to expose.
[Related: Role of CIO Expands with Enterprise AI]
Jason Lopez: AI models operate on a stack of hardware and software. The software stack, often open source and containerized runs on a software-defined infrastructure in a hybrid cloud. It brings together things such as storage, compute, power, networking and management tools.
Induprakas Keri: If you have to take that software infrastructure and run it on one platform in the data center and another platform at the edge and a third platform on the public cloud, then it’s your job as an IT organization to make sure that it runs the same way on all those three different infrastructures at the same level of performance and management and usability and all of that.
Jason Lopez: A software stack that works across public clouds, data centers and edge environments makes deployment and management easier. Workloads can be moved between infrastructures with minimal friction,
Induprakas Keri: Even though all the components that you need for running an AI model are containerized. It takes a lot of effort to put all those different Kubernetes components together, and so the pathogens of all those different components, making sure that they work together, making sure that they’re optimized for performance and the underlying hardware infrastructure, we are able to make that drop dead simple, and that’s the value for organizations that are looking to do a lot of experimentation with AI really, really fast.
[Related: Speeding Software Development with AI-Assisted Coding]
Jason Lopez: The interview we did with Keri unfolded along the lines of avoiding large AI investments toward an uncertain goal and toward the advantage of a software-defined infrastructure.
Induprakas Keri: Which means that storage, compute, networking, management, they’re all defined through APIs, they’re all controllable through APIs, they’re all managed through APIs. They’re governed through APIs, and so what that really does is that you can stand up an infrastructure and then you can change the behavior of that really simply by writing code that inverts those APIs. Let’s say that I find out that I need to rethink my cluster because I have too much compute and not enough storage, so then very simply, I add half a dozen storage heavy nodes and I rebalance the storage and compute needs of my cluster to be able to run this model really well, and I can do all of that through software.
Jason Lopez: Induprakas Keri leads the Nutanix Hybrid Cloud team. This is the Tech Barometer podcast. I’m Jason Lopez. We’ve got more tech stories covering enterprise computing and IT technology at the forecast by nutanix.com.

 
Posted in:
Artificial Intelligence, Audio Podcast, Tech Barometer - From The Forecast by Nutanix