How should the input corpus of gensim LDA look like?

Multi tool use
I try two different kind of input corpus to put into gensim LDA model
My document is:
documents = ["Apple is releasing a new product",
"Amazon sells many things",
"Microsoft announces Nokia acquisition"]
texts = [[word for word in document.lower().split() if word not in stop_words] for document in documents]
texts1 =
for i in texts:
for t in i:
texts1.append([t])
And use gensim to make it into corpus
corpus = [[(0, 1), (1, 1), (2, 1), (3, 1)], [(4, 1), (5, 1), (6, 1), (7, 1)], [(8, 1), (9, 1), (10, 1), (11, 1)]]
corpus1 = [[(0, 1)], [(1, 1)], [(2, 1)], [(3, 1)], [(4, 1)], [(5, 1)], [(6, 1)], [(7, 1)], [(8, 1)], [(9, 1)], [(10, 1)], [(11, 1)]]
Is there a huge difference if I use this two kind of way to put it into LDA model?
When I try these two ways, the difference is about the probability distribution of the word in the topics, corpus1
is much smaller than corpus
in terms of probabilities.
I try a larger document to do LDA and corpus1
always show me an extremely low probability like 0.0001
Is there a better way to put corpus into LDA model?
python-3.x gensim lda topic-modeling
add a comment |
I try two different kind of input corpus to put into gensim LDA model
My document is:
documents = ["Apple is releasing a new product",
"Amazon sells many things",
"Microsoft announces Nokia acquisition"]
texts = [[word for word in document.lower().split() if word not in stop_words] for document in documents]
texts1 =
for i in texts:
for t in i:
texts1.append([t])
And use gensim to make it into corpus
corpus = [[(0, 1), (1, 1), (2, 1), (3, 1)], [(4, 1), (5, 1), (6, 1), (7, 1)], [(8, 1), (9, 1), (10, 1), (11, 1)]]
corpus1 = [[(0, 1)], [(1, 1)], [(2, 1)], [(3, 1)], [(4, 1)], [(5, 1)], [(6, 1)], [(7, 1)], [(8, 1)], [(9, 1)], [(10, 1)], [(11, 1)]]
Is there a huge difference if I use this two kind of way to put it into LDA model?
When I try these two ways, the difference is about the probability distribution of the word in the topics, corpus1
is much smaller than corpus
in terms of probabilities.
I try a larger document to do LDA and corpus1
always show me an extremely low probability like 0.0001
Is there a better way to put corpus into LDA model?
python-3.x gensim lda topic-modeling
add a comment |
I try two different kind of input corpus to put into gensim LDA model
My document is:
documents = ["Apple is releasing a new product",
"Amazon sells many things",
"Microsoft announces Nokia acquisition"]
texts = [[word for word in document.lower().split() if word not in stop_words] for document in documents]
texts1 =
for i in texts:
for t in i:
texts1.append([t])
And use gensim to make it into corpus
corpus = [[(0, 1), (1, 1), (2, 1), (3, 1)], [(4, 1), (5, 1), (6, 1), (7, 1)], [(8, 1), (9, 1), (10, 1), (11, 1)]]
corpus1 = [[(0, 1)], [(1, 1)], [(2, 1)], [(3, 1)], [(4, 1)], [(5, 1)], [(6, 1)], [(7, 1)], [(8, 1)], [(9, 1)], [(10, 1)], [(11, 1)]]
Is there a huge difference if I use this two kind of way to put it into LDA model?
When I try these two ways, the difference is about the probability distribution of the word in the topics, corpus1
is much smaller than corpus
in terms of probabilities.
I try a larger document to do LDA and corpus1
always show me an extremely low probability like 0.0001
Is there a better way to put corpus into LDA model?
python-3.x gensim lda topic-modeling
I try two different kind of input corpus to put into gensim LDA model
My document is:
documents = ["Apple is releasing a new product",
"Amazon sells many things",
"Microsoft announces Nokia acquisition"]
texts = [[word for word in document.lower().split() if word not in stop_words] for document in documents]
texts1 =
for i in texts:
for t in i:
texts1.append([t])
And use gensim to make it into corpus
corpus = [[(0, 1), (1, 1), (2, 1), (3, 1)], [(4, 1), (5, 1), (6, 1), (7, 1)], [(8, 1), (9, 1), (10, 1), (11, 1)]]
corpus1 = [[(0, 1)], [(1, 1)], [(2, 1)], [(3, 1)], [(4, 1)], [(5, 1)], [(6, 1)], [(7, 1)], [(8, 1)], [(9, 1)], [(10, 1)], [(11, 1)]]
Is there a huge difference if I use this two kind of way to put it into LDA model?
When I try these two ways, the difference is about the probability distribution of the word in the topics, corpus1
is much smaller than corpus
in terms of probabilities.
I try a larger document to do LDA and corpus1
always show me an extremely low probability like 0.0001
Is there a better way to put corpus into LDA model?
python-3.x gensim lda topic-modeling
python-3.x gensim lda topic-modeling
asked Dec 28 '18 at 3:31


wayne64001
414
414
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53953365%2fhow-should-the-input-corpus-of-gensim-lda-look-like%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53953365%2fhow-should-the-input-corpus-of-gensim-lda-look-like%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
N,hpJogCh45,LaHzA8ORWooscwk7gIR9AaPF 8O