Using loc to select columns results in some NaNs in row values





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:



SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns


I have a training set that I've created by removing a fraction of inputs, by column, this way:



train_dataset = l.sample(frac=0.8,random_state=0, axis=1)


The resulting columns left in train_dataset look like this:



Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)


I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:



train_labels = m.loc[:, train_dataset.columns]


But this results in:



421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns


So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.










share|improve this question


















  • 1





    what is m ? can you show ?

    – YOLO
    Jan 4 at 18:34











  • SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

    – julieb
    Jan 4 at 19:25











  • I'm having some trouble replicating. What is l and how is it different than m?

    – Polkaguy6000
    Jan 4 at 20:03











  • column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

    – julieb
    Jan 4 at 20:13











  • l is the 'raw data' for the training set, m is the raw data for the training labels

    – julieb
    Jan 4 at 20:26


















0















I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:



SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns


I have a training set that I've created by removing a fraction of inputs, by column, this way:



train_dataset = l.sample(frac=0.8,random_state=0, axis=1)


The resulting columns left in train_dataset look like this:



Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)


I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:



train_labels = m.loc[:, train_dataset.columns]


But this results in:



421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns


So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.










share|improve this question


















  • 1





    what is m ? can you show ?

    – YOLO
    Jan 4 at 18:34











  • SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

    – julieb
    Jan 4 at 19:25











  • I'm having some trouble replicating. What is l and how is it different than m?

    – Polkaguy6000
    Jan 4 at 20:03











  • column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

    – julieb
    Jan 4 at 20:13











  • l is the 'raw data' for the training set, m is the raw data for the training labels

    – julieb
    Jan 4 at 20:26














0












0








0








I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:



SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns


I have a training set that I've created by removing a fraction of inputs, by column, this way:



train_dataset = l.sample(frac=0.8,random_state=0, axis=1)


The resulting columns left in train_dataset look like this:



Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)


I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:



train_labels = m.loc[:, train_dataset.columns]


But this results in:



421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns


So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.










share|improve this question














I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:



SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169
om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43
1 rows × 519 columns


I have a training set that I've created by removing a fraction of inputs, by column, this way:



train_dataset = l.sample(frac=0.8,random_state=0, axis=1)


The resulting columns left in train_dataset look like this:



Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,
...
256, 94, 494, 95, 470, 169, 69, 305, 48, 341],
dtype='int64', length=415)


I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:



train_labels = m.loc[:, train_dataset.columns]


But this results in:



421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341
om_unspec NaN NaN NaN NaN 10.0 9.59 NaN NaN NaN 10.0 ... NaN 10.0 NaN NaN NaN 10.0 10.0 NaN NaN NaN
1 rows × 415 columns


So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.







pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 4 at 18:30









juliebjulieb

53




53








  • 1





    what is m ? can you show ?

    – YOLO
    Jan 4 at 18:34











  • SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

    – julieb
    Jan 4 at 19:25











  • I'm having some trouble replicating. What is l and how is it different than m?

    – Polkaguy6000
    Jan 4 at 20:03











  • column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

    – julieb
    Jan 4 at 20:13











  • l is the 'raw data' for the training set, m is the raw data for the training labels

    – julieb
    Jan 4 at 20:26














  • 1





    what is m ? can you show ?

    – YOLO
    Jan 4 at 18:34











  • SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

    – julieb
    Jan 4 at 19:25











  • I'm having some trouble replicating. What is l and how is it different than m?

    – Polkaguy6000
    Jan 4 at 20:03











  • column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

    – julieb
    Jan 4 at 20:13











  • l is the 'raw data' for the training set, m is the raw data for the training labels

    – julieb
    Jan 4 at 20:26








1




1





what is m ? can you show ?

– YOLO
Jan 4 at 18:34





what is m ? can you show ?

– YOLO
Jan 4 at 18:34













SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25





SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25













I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03





I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03













column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13





column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13













l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26





l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044301%2fusing-loc-to-select-columns-results-in-some-nans-in-row-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044301%2fusing-loc-to-select-columns-results-in-some-nans-in-row-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas