GPUs in Cloud Infrastructure with Veronica Nigro from mkinf |🎙️#50

Promotional graphic for "DevOps Accents Episode 50" featuring an illustration of a woman with the title "GPUs in Cloud Infrastructure with Veronica Nigro from MKINF" on an orange background. Promotional graphic for "DevOps Accents Episode 50" featuring an illustration of a woman with the title "GPUs in Cloud Infrastructure with Veronica Nigro from MKINF" on an orange background.

What is the place for GPUs in the modern Cloud environment? How did we come to them being so integral for AI and is the naming now a bit confusing for general public? What can we expect from this area? Our guest for episode 50 of DevOps Accents is Veronica Nigro, the co-founder of mkinf, a company providing access to distributed GPUs worldwide.

  • How GPUs are used in cloud infrastructure;
  • Should we rename GPUs?
  • What are specific skills and knowledge required for GPU-based infra?
  • Is there competition with hyperscalers?
  • Cost management for GPUs;
  • What will the future look like?

You can listen to episode 50 of DevOps Accents on Spotify, or right now:


The potential of GPUs in cloud infrastructure is undeniable, and understanding their role is essential in today’s AI-driven world. Veronica Nigro, co-founder of mkinf, joined Leo, Pablo and Kirill to discuss how GPUs are transforming industries, the challenges of managing them, and their future applications.

The Growing Role of GPUs in Cloud Infrastructure

Veronica begins by highlighting the fundamental difference between GPUs and CPUs. “GPUs excel in parallel processing,” she explains. This makes them ideal for tasks like AI model training, 3D rendering, and real-time data analysis. Despite their origins in gaming, GPUs now serve industries ranging from biomedical research to climate simulations.

Pablo reflects on this evolution: “It’s fascinating how GPUs, once just for video games, are now the backbone of AI.” Kirill adds, “The versatility of GPUs, especially for training and inference, makes them indispensable, but also introduces complexities in infrastructure.”

Should GPUs Be Renamed?

The term "GPU" often creates misconceptions about its capabilities, rooted in its origin as a “graphics processing unit.” Veronica agrees that the name doesn’t reflect their broader applications but notes that her technical audience generally understands their utility. “It’s not about renaming,” she says, “but about helping users identify the right GPU for their needs.”

Pablo humorously compares this to other industries: “It’s like realizing your gaming GPU is now solving biomedical problems. The shift is incredible, but the name hasn’t caught up.”


I’m not sure I'm in the position to change the name of GPUs to be fair. They could definitely do maybe with better naming. Although the people that we talk to are generally technical. they know that they need, it's a step further. They already know that they need GPUs for their workloads. The next thing they might not know is what kind of GPUs they need. So there are a lot of models, a lot of applications for GPUs. And a particular application may need a more powerful GPU, less powerful GPU. Not necessarily about powerful, but like some, let's say, smaller and cheaper GPUs might be best optimized for certain types of tasks. And not everyone in the AI space, let's say not everyone that does AI, deploy AI models, train models, is necessarily super familiar with AI infrastructure as well. So you need to understand the specific of the hardware, the specific of the memory as well, because you don't want to create a bottleneck into your process. So I think it's a bit of a, you know, like a step further regarding GPUs knowledge. — Veronica Nigro


Skills Needed for GPU Infrastructure

Working with GPU-based systems requires specialized knowledge. Veronica outlines the importance of understanding GPU models, memory optimization, and scaling strategies. “It’s not just about deploying GPUs,” she notes. “You need expertise in MLOps and orchestration to avoid bottlenecks.”

Kirill emphasizes the challenges teams face: “The same workload might behave differently on various GPUs. You have to match software with hardware to ensure consistency.” Pablo echoes this, adding that the high cost of GPUs means teams must carefully plan their use to maximize efficiency.


It's not only about understanding what kind of GPU you need, how much memory you need, and what the whole infrastructure requires; you also need to be able to work with it and configure it in terms of cost management and optimizations. For example, you want to have the best setup with scaling policies so that all your workloads and hardware are optimized in order to reduce idle time. At some point, once you train your models, you need to deploy these models and workloads. So, it's a whole theme regarding the orchestration of GPUs. And these are tasks that, you know, are specific for a lot of different positions in a technical team, I think have emerged. People who are usually in charge of dealing with this AI infrastructure are in DevOps positions. An ML engineer, software developer, or data scientist might not necessarily have the knowledge to deal with this infrastructure. And lots of startups that are emerging into this world might not have the resources to have all these positions. — Veronica Nigro


Competing with Hyperscalers

Veronica addresses the competition posed by cloud giants like AWS and Google. She explains that mkinf is tackling inefficiencies in GPU utilization by creating a standardized GPU pool. “We’re building a compute grid,” she says, “to give users scalable access to GPUs without the complexity of managing them directly.”

Kirill notes the advantage of such focused platforms: “Hyperscalers are often more expensive and less optimized for specific tasks. Startups like mkinf can offer tailored solutions that outperform larger providers in certain areas.”


I don't necessarily see them as competitive; I think they just have different use cases. The data center space is literally where people are pouring money into it. And you have the three big giants, the three big hyperscalers that you mentioned. With all these investments, we are pretty much creating silos of data centers being created here and there. In order to utilize them, first of all, they want to focus on their hardware, having the latest hardware, focusing on ESG metrics. So, they are not building products optimized for AI. That's the thing. And then they're all extremely different. If you want a more complete offering, taking GPUs from here and GPUs from there, you need to deal with different APIs, different protocols, different contracts. So, what we're doing is standardizing the access to all these independent data centers, which can, you know, they're more like the underdogs in this space, but can also give you extremely powerful machines with the certifications and everything, but at a fraction of the cost. So, we are standardizing this access and pretty much creating a GPU pool, kind of like an energy grid, but a compute grid, we call it, where people can share resources, take GPUs and put them on the grid. But also, the idea was always to create this solid ground to build something on top optimized for AI. So now that we have this pool of GPUs, which is scalable, it's flexible, you can take machines by the minute and stop them whenever you want. But the idea is not just to give you a raw machine but to give you a simplified way to work with it. — Veronica Nigro


Managing GPU Costs

The expense of GPUs is a recurring theme in the conversation. Veronica points out that while costs might seem manageable initially, they can escalate quickly when scaled. “Even at $1 per hour, using 100 GPUs for extended periods adds up fast,” she explains.

To tackle this, Veronica advocates for flexible contracts, auto-scaling systems, and workload optimization. Pablo emphasizes the importance of solutions that simplify these decisions, especially for startups: “Nobody wants to deal with the nightmare of managing GPU costs manually.”


It's the units that you need to consider. It's like one GPU, and then, even if it's like $1 per hour, you might need like a hundred, and that's already like $100 per hour. And then you might need them for X amount of hours per month. And then you end up spending thousands of dollars per month, even if it just started with like $1 per hour. So, this is something that people, companies definitely want to consider and look out for, and look for other solutions that can help them optimize these costs. — Veronica Nigro


What the Future Holds for GPUs

The group discusses the future of GPU technology and its applications. Veronica predicts a shift towards more localized computing, with GPUs potentially integrated into personal devices. “This space moves so fast,” she says. “What feels distant today might be reality tomorrow.”

Kirill speculates that generative AI models may eventually run directly on personal hardware, reducing reliance on cloud infrastructure. Veronica agrees that this is possible, but notes the current limitations of GPU size and power. Pablo concludes optimistically, saying, “This is the moment for platforms like mkinf. GPUs are in demand everywhere.”


This is a space that's not going to move quickly; it already moves way too fast. So the only thing you can do is pretty much being able to adapt as fast. Again, I'm not, you mentioned bubbly, you know, AI agents are going to be something of the future, but I think it's already happening. So once you say it's already happening, you need to be able to adapt. And what we want to build is something that is also able to adapt fast. So we mentioned before, there's all this infrastructure of GPUs, we're actually GPU agnostic. We're not only taking Nvidia—right now, it's the main producer of these chips—but there are many other players right now that are producing their own chips and they're more optimized for other specific tasks. So definitely being able to integrate all these other GPUs and have a more complete offering will make us more flexible and more ready for what's going to come ahead. But also having this infrastructure, which is flexible, scalable, and you can build pretty much anything on top related to AI, makes us quick to adapt to anything that might come, I think. We're not tied in with any crazy contracts or anything. So I think it's all about, this is a space that moves very fast and you need to adapt even faster. So it all comes down to that. — Veronica Nigro


GPUs at the Forefront of Innovation

GPUs are no longer confined to gaming or niche applications. They are driving advancements across industries, reshaping how we approach computing, and unlocking new possibilities for AI. The challenges of cost, complexity, and competition remain, but the future of GPUs is one of rapid growth and untapped potential.


I think the market, the space, is definitely moving towards, well, definitely closer to the users. So we're seeing, you know, urban data centers or, you know, smaller ones, especially because of the huge amount of energy they use. So this is probably something that needed to happen anyway—smaller, more distributed data centers. So we're definitely getting closer to the users. The next step, after there are urban data centers or smaller distributed data centers, is probably going to be GPUs or computing directly on your device. I don't have a timeframe for that because, at the moment, GPUs are this big and then they need to fit into your hardware. And even if I might say, you know, it'll take at least five years, it moves so quickly that it could happen next year. I have no idea. — Veronica Nigro



Show Notes:

  • Our guest, Veronica Nigro, the co-founder of mkinf, on LinkedIn;
  • mkinf, distributed inference for GenAI companies. mkinf provides access to distributed GPUs worldwide, reducing latency and making it easier for businesses that need real-time AI responses. Their platform aggregates fragmented computational power across data centers, making it easier to scale and optimize AI infrastructure for model deployment and inference;
  • Evolving Drivers of AI Infrastructure Optimization, an interview with Veronica for more insights on inference.

Podcast editing: Mila Jones, milajonesproduction@gmail.com

Previous Episode • All Episodes