Confusion between Operational and Analytical Big Data and on which category Hadoop operates?
I can't wrap my head around the basic theoretical concept of 'Operational and Analytical Big Data'.
According to me:
Operational Big Data: Branch where we can perform Read/write operations on big data using specially designed Databases (NoSQL). Somewhat similar to ETL in RDMS.
Analytical Big Data: Branch where we analyse data in retrospect and draw predictions using techniques like MPP and MapReduce. Somewhat similar to reporting in RDMS.
(Please feel free to correct wherever I'm wrong, it's just my understanding.)
So according to me, Hadoop is used for Analytical Big Data where we just process data for analysis but don't temper original data and hence is not an idea choice for ETL.
But recently I have come across this article which advocates using Hadoop for ETL: https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/
hadoop bigdata
add a comment |
I can't wrap my head around the basic theoretical concept of 'Operational and Analytical Big Data'.
According to me:
Operational Big Data: Branch where we can perform Read/write operations on big data using specially designed Databases (NoSQL). Somewhat similar to ETL in RDMS.
Analytical Big Data: Branch where we analyse data in retrospect and draw predictions using techniques like MPP and MapReduce. Somewhat similar to reporting in RDMS.
(Please feel free to correct wherever I'm wrong, it's just my understanding.)
So according to me, Hadoop is used for Analytical Big Data where we just process data for analysis but don't temper original data and hence is not an idea choice for ETL.
But recently I have come across this article which advocates using Hadoop for ETL: https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/
hadoop bigdata
add a comment |
I can't wrap my head around the basic theoretical concept of 'Operational and Analytical Big Data'.
According to me:
Operational Big Data: Branch where we can perform Read/write operations on big data using specially designed Databases (NoSQL). Somewhat similar to ETL in RDMS.
Analytical Big Data: Branch where we analyse data in retrospect and draw predictions using techniques like MPP and MapReduce. Somewhat similar to reporting in RDMS.
(Please feel free to correct wherever I'm wrong, it's just my understanding.)
So according to me, Hadoop is used for Analytical Big Data where we just process data for analysis but don't temper original data and hence is not an idea choice for ETL.
But recently I have come across this article which advocates using Hadoop for ETL: https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/
hadoop bigdata
I can't wrap my head around the basic theoretical concept of 'Operational and Analytical Big Data'.
According to me:
Operational Big Data: Branch where we can perform Read/write operations on big data using specially designed Databases (NoSQL). Somewhat similar to ETL in RDMS.
Analytical Big Data: Branch where we analyse data in retrospect and draw predictions using techniques like MPP and MapReduce. Somewhat similar to reporting in RDMS.
(Please feel free to correct wherever I'm wrong, it's just my understanding.)
So according to me, Hadoop is used for Analytical Big Data where we just process data for analysis but don't temper original data and hence is not an idea choice for ETL.
But recently I have come across this article which advocates using Hadoop for ETL: https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/
hadoop bigdata
hadoop bigdata
edited Jan 12 at 6:50
cricket_007
81.5k1142111
81.5k1142111
asked Dec 31 '18 at 10:43
Kajal_TKajal_T
256
256
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.
You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)
That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986502%2fconfusion-between-operational-and-analytical-big-data-and-on-which-category-hado%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.
You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)
That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
add a comment |
Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.
You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)
That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
add a comment |
Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.
You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)
That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.
Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.
You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)
That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.
edited Jan 12 at 6:51
answered Jan 12 at 6:46
cricket_007cricket_007
81.5k1142111
81.5k1142111
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
add a comment |
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
Thank you! I am just starting to learn about Big Data Analysis and was unconsciously comparing every concept to relation DB processing. I am growing to have much more clarity on this every day as I continue learning. I truly appreciate your answer though!
– Kajal_T
Jan 22 at 9:51
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986502%2fconfusion-between-operational-and-analytical-big-data-and-on-which-category-hado%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown