Hadoop正在走下坡路
作者:George Hill ,他是知名商業(yè)媒體 Innovation Enterprise 的主編,同時(shí)也是 The Cyclist 公司的聯(lián)合創(chuàng)始人。本文由可譯網(wǎng)toypipi , 中山狼 , 薯片番茄, 班納睿翻譯。
長(zhǎng)期以來,Hadoop 這個(gè)詞鋪天蓋地,幾乎成了大數(shù)據(jù)的代名詞。三年之前,提起超越 Hadoop 這件事,似乎還顯得難以想象。但三年后的今天,這一情況發(fā)生了一些改變。
早在 2012 年,知名媒體 SiliconANGLE 就針對(duì) Twitter 平臺(tái)上的大數(shù)據(jù)專業(yè)人士做了一項(xiàng)調(diào)查。調(diào)查結(jié)果顯示:這些專業(yè)人士日常談?wù)?NoSQL 等技術(shù)(如 MongoDB)的次數(shù)要遠(yuǎn)多于 Hadoop。這表明,至少在數(shù)據(jù)科學(xué)家的群體中,用 Hadoop 代指大數(shù)據(jù)似乎并不準(zhǔn)確
然而大多數(shù)人認(rèn)為 Hadoop 已經(jīng)是大數(shù)據(jù)最重要的技術(shù)之一,是大數(shù)據(jù)構(gòu)建的基礎(chǔ)。它還被利用在一些新的領(lǐng)域,如倉儲(chǔ)系統(tǒng)。話雖如此,出人意料的是,它的適用性或多或少有點(diǎn)滯后。對(duì)此,IBM Software 的傳道士 James Kobielus 說道:“ 2016 年,Hadoop 在大數(shù)據(jù)領(lǐng)域的下滑速度比我預(yù)期的要快得多。”
其中原因很難說清,但可將其理解為數(shù)據(jù)領(lǐng)域的慣有現(xiàn)象。Gartner 于 2015 年的調(diào)查顯示,54% 的公司都沒有計(jì)劃投資 Hadoop,另外 44% 的公司表示已使用 Hadoop 或?qū)⒃谖磥韮赡晔褂?。這些數(shù)據(jù)不同人看來有不同的觀點(diǎn),你可以認(rèn)為 Hadoop 將進(jìn)一步擴(kuò)大,也可以認(rèn)為大多數(shù)人根本不重視 Hadoop。同時(shí),調(diào)查還揭露了一些其他無法平息的影響因素。在沒有投資的人當(dāng)中,49% 的人仍在努力挖掘 Hadoop 的使用價(jià)值,而另外 57% 的人指出,其中的技能差距是決定是否使用的主要阻礙,而這并不能立馬得到解決。這一現(xiàn)象恰好與 ”Hadoop Testing“ 關(guān)于就業(yè)趨勢(shì)的調(diào)查結(jié)果相一致:在 2014 年中旬,這一關(guān)鍵詞在大約 0.061% 的廣告中出現(xiàn),在 2016 年末又增長(zhǎng)至 0.087%,在 18 個(gè)月內(nèi),增長(zhǎng)了約 43%。
這可能表明,采用Hadoop的公司數(shù)量不一定會(huì)降低到坊間證據(jù)表明的那樣,但公司只是發(fā)現(xiàn)很難從他們現(xiàn)有的團(tuán)隊(duì)中提取Hadoop的價(jià)值,他們需要更多的專業(yè)知識(shí)。
另一個(gè)可能引起人們關(guān)注的因素是,一個(gè)人的大數(shù)據(jù)卻是另一個(gè)人的小數(shù)據(jù)。 Hadoop是為大量數(shù)據(jù)而設(shè)計(jì)的,Kashif Saiyed在KD Nuggets上寫道:‘如果你的企業(yè)沒有真正面臨海量數(shù)據(jù)的問題,你就不需要Hadoop,因此數(shù)百家企業(yè)對(duì)他們無用的、處理2到10TB數(shù)據(jù)規(guī)模大小的 Hadoop集群感到非常失望 – Hadoop技術(shù)只是不擅長(zhǎng)處理這種規(guī)模。‘
大多數(shù)公司目前沒有足夠的數(shù)據(jù)來保證Hadoop的部署,但還是這么做的原因是他們覺得他們需要互相攀比。 經(jīng)過幾年的實(shí)驗(yàn),并與真正的數(shù)據(jù)科學(xué)家一起工作,他們很快就意識(shí)到他們的數(shù)據(jù)在其他技術(shù)上工作得更好。
這種趨勢(shì)已經(jīng)超出了采用開源平臺(tái)的速度,但對(duì)于一些公司來說,這已經(jīng)產(chǎn)生了實(shí)際的財(cái)務(wù)影響。 Cloudera和Hortonworks是從Hadoop框架構(gòu)建自己產(chǎn)品的兩家最大的公司。 由于Hadoop的下滑,對(duì)于兩家公司都造成了不同程度的重大損失,據(jù)報(bào)告Cloudera失去了40%,而Hortonworks的股價(jià)自2015年中期以來已經(jīng)下跌了68%。
這篇文章對(duì)Hadoop的批評(píng)似乎有些苛刻,但并不是平臺(tái)本身造成了當(dāng)前的問題。 相反,這可能是由于過分炒作和大數(shù)據(jù)協(xié)會(huì)導(dǎo)致了事實(shí)上的傷害。一些公司采用了該平臺(tái)卻沒有理解它,同時(shí)又沒有合適的人或數(shù)據(jù)來使其正常工作,這導(dǎo)致了項(xiàng)目實(shí)施的幻滅和明顯的停滯。Hadoop依然還有強(qiáng)大的生命力,只是人們需要更好地理解它。
原文:
Three years ago, looking beyond Hadoop was insanity, and there was little else that could come close according to many in the media. However, the reality has been a little different.
For a long period, Hadoop and big data were almost interchangeable when they were being discussed by those in the media, although this was not necessarily found to be the case amongst data scientists. A study by Silicon Angle in 2012 analyzing Twitter conversations between data professionals talking about big data found that they actually talked about NoSQL technologies like MongoDB as much, or more, than Hadoop, which would indicate that it has not actually been the must have that many assumed it was.
Most would argue that Hadoop has been one of the single most important elements in the spread of big data, that it is very much the foundation on which data today is built. We are also still finding new ways to use it, in warehousing for instance. That being said, to the surprise of many, its adoption appears to have more or less stagnated, leading even James Kobielus, Big Data Evangelist at IBM Software, to claim that ‘Hadoop declined more rapidly in 2016 from the big-data landscape than I expected.’
The reasons for this are hard to ascertain, but could be down to a problem common in data circles. A 2015 study from Gartner found that 54% of companies had no plans to invest in Hadoop, while 44% of those asked had adopted Hadoop already or planned to at some point in the next two years. This could, depending on your point of view, be taken to mean either that it would see even further expansion or that the majority were ignoring it. However, the survey also revealed a number of other telling factors with implications unlikely to have subsided since. Of those who were not investing, 49% were still trying to figure out how to use it for value, while 57% said that the skills gap was the major reason, a number that is not going to be corrected overnight. This coincides with findings from Indeed who tracked job trends with ‘Hadoop Testing’ in the title, with the term featured in a peak of 0.061% of ads in mid 2014, which then jumped to 0.087% in late 2016, an increase of around 43% in 18 months.
What this may signal is that adoption hasn’t necessarily dropped to the extent that anecdotal evidence would suggest, but companies are simply finding it difficult to extract value from Hadoop from their current teams and they require greater expertise.
Another element that may be cause for concern is simply that one man’s big data is another man’s small data. Hadoop is designed for huge amounts of data, and as Kashif Saiyed wrote on KD Nuggets ‘You don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale.’
Most companies do not currently have enough data to warrant a Hadoop rollout, but did so anyway because they felt they needed to keep up with the Joneses. After a few years of experimentation and working alongside genuine data scientists, they soon realize that their data works better in other technologies.
This trend has had impacts beyond a slow down in the adoption of an open source platform though, for some companies this has had real world financial impacts. Cloudera and Hortonworks are two of the biggest companies that build their products out from a Hadoop framework. Both have lost significant value in-part due to its decline, with Cloudera reported to have lost 40% whilst Hortonworks’ shares have plummeted 68% since mid 2015.
Criticism within this article may seem harsh on Hadoop, but it is not the platform in itself that has caused the current issues. Instead it is perhaps the hype and association of big data that has done the real damage. Companies have adopted the platform without understanding it and then failed to get the right people or data to make it work properly, which has led to disillusionment and its apparent stagnation. There is still a huge amount of life in Hadoop, but people just need to understand it better.