There are good reasons to move large data systems to the cloud, but doing so poses challenges for IT groups to move workloads and then manage clusters and cases system.
Companies are increasingly turning large data clusters to the cloud for greater flexibility and easier scalability. But IT managers have made this move warning that getting clusters there is not easy and that there are complex issues going on after you do.
Barriers start with workloads and data migration challenges, and they continue with many data management issues, according to speakers and attendees at the 2017 Strata Data Conference here. They have shown things like frequent system crashes and the need to carefully manage temporary clusters that are set up to run specific processing jobs and then shut down. In addition, they said some workloads are not suitable for the cloud computing model, which may require integration with systems running internally.
The ability to automatically rotate and modify large clusters of data when needed in the cloud helps address valuable weaknesses for Chris Mills, the leader of a large data group at The Meet Group Inc., a New Hope company, Pa., Operating social networking settings and online dating sites.
After switching from a large data-in-place environment to an environment in the Amazon Web Services cloud (AWS), clusters can be added or expanded “in minutes,” Mills said. That has reduced IT costs and made experimental analysis and “deep diving” applications more feasible, he added.
But switching to the cloud “will cost more and take longer than you intended,” Mills warned during a session. In the case of the Meeting Team, it is partly due to the project team identifying potential new applications during the migration process. But unexpected problems also appeared along the way, he said. All said it takes about six months to set up big cloud-based data architecture and another six months to adjust the environment.
Catch up with moving on the cloud
At Spotify Music, moving thousands of workloads from the on-site Hadoop cluster to the new architecture on Google Cloud Platform has created both technical and organizational challenges, Alison Gilles, Technical director of data infrastructure group said.
Gift who is based in Stockholm, can’t start moving jobs to the cloud without being able to prevent others from continuing to run successfully, Gilles, who works at the US company’s headquarters in New York for know. Their product and operating groups are also unable to control their workloads, which may stop working on online delivery-related projects to focus on migration efforts.
To ensure the workflow is not blocked, Spotify is actively copying data back and forth between the on-premises cluster and the cloud architecture, Josh Baer, who manages the data migration process. In August, the job copies up to 110,000 jobs in its own way.
“We are suffering from some technical debt,” Baer admitted in a joint presentation with Gilles. “But we think long-term benefits are worth some short-term pain here.”
Data infrastructure units have also developed an open source software suite to help streamline migration. To support “lifting” the workload on the cloud platform, Baer said, they have built a tool to schedule mass migration jobs to run in Docker containers through Kubernetes coordination tools. , plus automatic technology to set up large temporary data clusters to handle migrant workflows. The Scala API is also created for groups who want to rewrite their workload as part of the migration process, although Baer said the infrastructure group encourages them to migrate previous applications.
A push and then a sprint
In addition, designated infrastructure engineers work directly with groups with particularly large data pipelines or need a “push” to go, Baer said. Pairs are intended to lead sprint development projects to complete the move in two weeks or less.
Spotify, which launched the cloud initiative in early 2016, made about 80% of the overall work move, Gilles said. She hopes to reduce most of the processing still being done on the spot cluster next year. “But you can imagine that in the last 20% [of the migration], there are many dragons,” she added. “So they may take more time, relatively speaking.”
Based on the experience of the Meeting Team, there are many dragons lurking in the big data cloud outside of the moving phase. For example, system versions in Amazon’s Elastic Computing Cloud (EC2) service “always fail,” Mills said. “It’s not like working in a data center. You can take five nights.”
Using EC2’s on-site individual feature, allowing AWS customers to bid on backup calculation buttons for temporary use, also represents danger, according to Mills. It may cut the cost by 50% compared to Amazon’s usual price, but he has seen spot cases stop working in between jobs because other users bid higher for nodes. point. “You can have an 80% complete job and the case disappears – there’s a problem,” Mills said.
Be careful with the cluster of zombie clouds
In addition, large data clusters in the temporary cloud that appear on dedicated nodes need to be closely monitored to avoid what Miller describes as zombie clusters that do not perform any processing work but still creating costs. “It is unlikely that a data center server will get a $ 40,000 bill over the weekend because someone let it run, but that happened to us in the cloud,” he said.
It is a similar situation at Ivy Tech Community College, which operates 45 facilities and satellite locations in Indiana. Ivy Tech uses AWS-based large data architecture and Pentaho BI tools and data integration from Hitachi Vantara to support self-service analysis system for its business users. But the cloud platform is not the answer to all the needs of Indianapolis-based university, CTO Lige Hensley said.
“For me, the cloud is part of the toolbox,” Hensley said in an interview ahead of an Ivy Tech deployment session. “You can use it for a lot of things, but it’s not perfect for everything. There are some workloads that we will never put into the cloud.”