Using loc to select columns results in some NaNs in row values
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns
I have a training set that I've created by removing a fraction of inputs, by column, this way:
train_dataset = l.sample(frac=0.8,random_state=0, axis=1)
The resulting columns left in train_dataset look like this:
Int64Index([421, 107, 310, 233, 173, 15, 134, 230, 438, 97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)
I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:
train_labels = m.loc[:, train_dataset.columns]
But this results in:
421 107 310 233 173 15 134 230 438 97 ... 256 94 494 95 470 169 69 305 48 341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns
So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.
pandas
|
show 2 more comments
I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns
I have a training set that I've created by removing a fraction of inputs, by column, this way:
train_dataset = l.sample(frac=0.8,random_state=0, axis=1)
The resulting columns left in train_dataset look like this:
Int64Index([421, 107, 310, 233, 173, 15, 134, 230, 438, 97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)
I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:
train_labels = m.loc[:, train_dataset.columns]
But this results in:
421 107 310 233 173 15 134 230 438 97 ... 256 94 494 95 470 169 69 305 48 341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns
So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.
pandas
1
what ism
? can you show ?
– YOLO
Jan 4 at 18:34
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
I'm having some trouble replicating. What isl
and how is it different thanm
?
– Polkaguy6000
Jan 4 at 20:03
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26
|
show 2 more comments
I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns
I have a training set that I've created by removing a fraction of inputs, by column, this way:
train_dataset = l.sample(frac=0.8,random_state=0, axis=1)
The resulting columns left in train_dataset look like this:
Int64Index([421, 107, 310, 233, 173, 15, 134, 230, 438, 97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)
I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:
train_labels = m.loc[:, train_dataset.columns]
But this results in:
421 107 310 233 173 15 134 230 438 97 ... 256 94 494 95 470 169 69 305 48 341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns
So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.
pandas
I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns
I have a training set that I've created by removing a fraction of inputs, by column, this way:
train_dataset = l.sample(frac=0.8,random_state=0, axis=1)
The resulting columns left in train_dataset look like this:
Int64Index([421, 107, 310, 233, 173, 15, 134, 230, 438, 97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)
I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:
train_labels = m.loc[:, train_dataset.columns]
But this results in:
421 107 310 233 173 15 134 230 438 97 ... 256 94 494 95 470 169 69 305 48 341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns
So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.
pandas
pandas
asked Jan 4 at 18:30
juliebjulieb
53
53
1
what ism
? can you show ?
– YOLO
Jan 4 at 18:34
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
I'm having some trouble replicating. What isl
and how is it different thanm
?
– Polkaguy6000
Jan 4 at 20:03
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26
|
show 2 more comments
1
what ism
? can you show ?
– YOLO
Jan 4 at 18:34
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
I'm having some trouble replicating. What isl
and how is it different thanm
?
– Polkaguy6000
Jan 4 at 20:03
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26
1
1
what is
m
? can you show ?– YOLO
Jan 4 at 18:34
what is
m
? can you show ?– YOLO
Jan 4 at 18:34
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
I'm having some trouble replicating. What is
l
and how is it different than m
?– Polkaguy6000
Jan 4 at 20:03
I'm having some trouble replicating. What is
l
and how is it different than m
?– Polkaguy6000
Jan 4 at 20:03
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26
|
show 2 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044301%2fusing-loc-to-select-columns-results-in-some-nans-in-row-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044301%2fusing-loc-to-select-columns-results-in-some-nans-in-row-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
what is
m
? can you show ?– YOLO
Jan 4 at 18:34
SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns
– julieb
Jan 4 at 19:25
I'm having some trouble replicating. What is
l
and how is it different thanm
?– Polkaguy6000
Jan 4 at 20:03
column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns
– julieb
Jan 4 at 20:13
l is the 'raw data' for the training set, m is the raw data for the training labels
– julieb
Jan 4 at 20:26