Score:-1

Why are Hadoop and Spark not in the official Ubuntu repositories?

cn flag

UPDATE (2021-11-13 22:12 GMT+8): regarding the Snap packages, @karel suggested that this question is a duplicate of Why don't the Ubuntu repositories have the latest versions of software? I disagree, because (1) Snaps, being self-confined and bundled with all its dependencies, are different from deb packages and I would expect the former to follow upstream more closely, and (2) even if not, I would expect them to be in stable by now.


I see this has already been asked in Hadoop & Spark - why no Ubuntu packages? , but (1) that was back in 2015 and the computing landscape has changed a lot since then, and (2) the only response to that other question does not really answer it, so I thought it would be appropriate to ask again.

So now in 2021 cloud computing and big data has only become more ubiquitous compared to 2015. Considering that one of the major use cases of Linux is in cloud computing / big data, why is the de-facto way of setting up Hadoop and Spark (key frameworks related to big data processing) still downloading and unpacking archives from upstream, instead of simply fetching the appropriate binary packages from the official Ubuntu repositories by running an appropriate apt install command? Unless I'm missing something, I imagine that having such commonly-used frameworks prepackaged for Ubuntu would bring a number of tangible benefits to a vast user base, such as (but not limited to):

  • Improved integration with the host system
  • Less manual setup and configuration required

P.S. I've also checked the Snap store considering Canonical's push towards snaps in recent years, and while they appear to be packaged (Hadoop, Spark), the last efforts were back in 2017 and they are only available in the unstable beta / edge channels.

karel avatar
sa flag
Does this answer your question? [Why don't the Ubuntu repositories have the latest versions of software?](https://askubuntu.com/questions/151283/why-dont-the-ubuntu-repositories-have-the-latest-versions-of-software)
Donald Sebastian Leung avatar
cn flag
No, because Hadoop and Spark do not seem to be in the official Ubuntu repositories _at all_ (I could not find anything relevant with `apt-cache search`)
karel avatar
sa flag
The hadoop and spark snap packages haven't been updated since 2017 either. That's what makes this question either a duplicate question or opinion-based.
Donald Sebastian Leung avatar
cn flag
But then (1) I'd expect Snap packages to follow upstream more closely, and (2) even if not, it should already be in stable by now
karel avatar
sa flag
I would expect the same thing too as both snap packages are maintained by the same person, but it didn't happen.
Score:2
cn flag

Both Hadoop and Spark were dropped from Debian years ago, mostly due to a lack of volunteer interest in maintaining those packages. Ubuntu gets most of its deb packages from Debian, so they were dropped from Ubuntu, too.

Any community volunteer willing to learn the process and contribute the effort can re-introduce the packages to Debian, and they will subsequently flow into future releases of Ubuntu. More volunteers = More, better, and up-to-date software.

Also, according to https://wiki.debian.org/Hadoop, the Hadoop developers didn't make deb packaging and maintaining easy for the Debian volunteers:

There are a number of reasons for this; in particular the Hadoop build process will load various dependencies via Maven instead of using distribution-supplied packages. Java projects like this are unfortunately not easy to package because of interdependencies; and unfortunately the Hadoop stack is full of odd dependencies

If this information is stale or incorrect, once again it's up to community volunteers to step up, make corrections, and implement changes. Debian and Ubuntu are driven by volunteers. More volunteers = Better documentation.

Donald Sebastian Leung avatar
cn flag
Thank you, this was the detailed explanation I was looking for. It's a shame that the Hadoop developers did not make it easy to package for distributions such as Debian (and Ubuntu). Maybe I should consider contributing sometime :-)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.