Spark Parquet file writing will not show any files in target folder
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am facing a wired situation. I am trying to read from oracle and write to a hdfs folder in parquet files using spark-sql 2.3.1. Below is my code snippet:
df.write.format("parquet")
.mode("overwrite")
.partitionBy(partitionColumn)
.save(parquet_file)
When I run this code in locally it is working fine, but when I run the same on a apache-spark cluster it is not at producing any results in the target folder.
Not sure what is missing but I don't see any errors in logs. Quite interestingly when I reduce the number of records of oracle table it is producing the folders as expected. How to solve this problem?
apache-spark apache-spark-sql parquet
add a comment |
I am facing a wired situation. I am trying to read from oracle and write to a hdfs folder in parquet files using spark-sql 2.3.1. Below is my code snippet:
df.write.format("parquet")
.mode("overwrite")
.partitionBy(partitionColumn)
.save(parquet_file)
When I run this code in locally it is working fine, but when I run the same on a apache-spark cluster it is not at producing any results in the target folder.
Not sure what is missing but I don't see any errors in logs. Quite interestingly when I reduce the number of records of oracle table it is producing the folders as expected. How to solve this problem?
apache-spark apache-spark-sql parquet
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05
add a comment |
I am facing a wired situation. I am trying to read from oracle and write to a hdfs folder in parquet files using spark-sql 2.3.1. Below is my code snippet:
df.write.format("parquet")
.mode("overwrite")
.partitionBy(partitionColumn)
.save(parquet_file)
When I run this code in locally it is working fine, but when I run the same on a apache-spark cluster it is not at producing any results in the target folder.
Not sure what is missing but I don't see any errors in logs. Quite interestingly when I reduce the number of records of oracle table it is producing the folders as expected. How to solve this problem?
apache-spark apache-spark-sql parquet
I am facing a wired situation. I am trying to read from oracle and write to a hdfs folder in parquet files using spark-sql 2.3.1. Below is my code snippet:
df.write.format("parquet")
.mode("overwrite")
.partitionBy(partitionColumn)
.save(parquet_file)
When I run this code in locally it is working fine, but when I run the same on a apache-spark cluster it is not at producing any results in the target folder.
Not sure what is missing but I don't see any errors in logs. Quite interestingly when I reduce the number of records of oracle table it is producing the folders as expected. How to solve this problem?
apache-spark apache-spark-sql parquet
apache-spark apache-spark-sql parquet
edited Jan 4 at 6:43
Shaido
13.1k123044
13.1k123044
asked Jan 4 at 6:34
ShyamShyam
3201418
3201418
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05
add a comment |
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034090%2fspark-parquet-file-writing-will-not-show-any-files-in-target-folder%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034090%2fspark-parquet-file-writing-will-not-show-any-files-in-target-folder%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
parquet_file here is the path of target folder for saving parquet files.
– Shyam
Jan 4 at 6:39
Check 2 Things 1. Count of Dataframe that you are writing - does it have any data? 2. Can you print path where you are writing i.e parquet_file and path where you are checking files. just want to make sure you have not messed up relative path
– Harjeet Kumar
Jan 4 at 6:47
@HarjeetKumar 1. I have lot of data in the table , the respected dataframe has around 1793,723594 i.e. 1790 million records. 2. When I read fewer records i.e. 1 million records I can see the files in target path , but when I read entire records i dont see any files in the target path. So its not the path issue.
– Shyam
Jan 4 at 7:05