Using loc to select columns results in some NaNs in row values

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:

SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169

om_unspec   10.0    8.24    10.0    6.78    10.0    8.54    10.0        10.0    10.0    10.0    ... 2.68    3.37    1.67    1.74    1.25    6.2     5.69    4.2 3.01    1.43

1 rows × 519 columns

I have a training set that I've created by removing a fraction of inputs, by column, this way:

train_dataset = l.sample(frac=0.8,random_state=0, axis=1)

The resulting columns left in train_dataset look like this:

Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,

        ...

        256,  94, 494,  95, 470, 169,  69, 305,  48, 341],

       dtype='int64', length=415)

I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:

train_labels = m.loc[:, train_dataset.columns]

But this results in:

421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341

om_unspec   NaN NaN NaN NaN 10.0    9.59    NaN NaN NaN 10.0    ... NaN     10.0    NaN NaN NaN 10.0    10.0    NaN NaN NaN

1 rows × 415 columns

So, the size is correct, the columns I want are correct, but the row data is mostly NaN. I have a feeling this has to do with m having the 'SampleNbr' index and train_labels NOT having the 'SampleNbr' index, but I don't know how to fix it. If I use iloc, I don't get the right set of columns in the training labels.

asked Jan 4 at 18:30

julieb

1

what is m ? can you show ?

– YOLO
Jan 4 at 18:34

SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25

I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03

column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13

l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26

|
show 2 more comments

I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:

SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169

om_unspec   10.0    8.24    10.0    6.78    10.0    8.54    10.0        10.0    10.0    10.0    ... 2.68    3.37    1.67    1.74    1.25    6.2     5.69    4.2 3.01    1.43

1 rows × 519 columns

I have a training set that I've created by removing a fraction of inputs, by column, this way:

train_dataset = l.sample(frac=0.8,random_state=0, axis=1)

The resulting columns left in train_dataset look like this:

Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,

        ...

        256,  94, 494,  95, 470, 169,  69, 305,  48, 341],

       dtype='int64', length=415)

I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:

train_labels = m.loc[:, train_dataset.columns]

But this results in:

421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341

om_unspec   NaN NaN NaN NaN 10.0    9.59    NaN NaN NaN 10.0    ... NaN     10.0    NaN NaN NaN 10.0    10.0    NaN NaN NaN

1 rows × 415 columns

asked Jan 4 at 18:30

julieb

1

what is m ? can you show ?

– YOLO
Jan 4 at 18:34

SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25

I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03

column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13

l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26

|
show 2 more comments

I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:

SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169

om_unspec   10.0    8.24    10.0    6.78    10.0    8.54    10.0        10.0    10.0    10.0    ... 2.68    3.37    1.67    1.74    1.25    6.2     5.69    4.2 3.01    1.43

1 rows × 519 columns

I have a training set that I've created by removing a fraction of inputs, by column, this way:

train_dataset = l.sample(frac=0.8,random_state=0, axis=1)

The resulting columns left in train_dataset look like this:

Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,

        ...

        256,  94, 494,  95, 470, 169,  69, 305,  48, 341],

       dtype='int64', length=415)

I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:

train_labels = m.loc[:, train_dataset.columns]

But this results in:

421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341

om_unspec   NaN NaN NaN NaN 10.0    9.59    NaN NaN NaN 10.0    ... NaN     10.0    NaN NaN NaN 10.0    10.0    NaN NaN NaN

1 rows × 415 columns

asked Jan 4 at 18:30

julieb

I have a dataframe, m, that looks like this, and I plan to turn it into training labels after removing a small number of test labels:

SampleNbr   1   2   3   4   5   6   7   8   9   10  ... 12155   12156       12157   12158   12159   12165   12166   12167   12168   12169

om_unspec   10.0    8.24    10.0    6.78    10.0    8.54    10.0        10.0    10.0    10.0    ... 2.68    3.37    1.67    1.74    1.25    6.2     5.69    4.2 3.01    1.43

1 rows × 519 columns

I have a training set that I've created by removing a fraction of inputs, by column, this way:

train_dataset = l.sample(frac=0.8,random_state=0, axis=1)

The resulting columns left in train_dataset look like this:

Int64Index([421, 107, 310, 233, 173,  15, 134, 230, 438,  97,

        ...

        256,  94, 494,  95, 470, 169,  69, 305,  48, 341],

       dtype='int64', length=415)

I want to keep the same columns in my training labels as in my training data, so I select from m using training_set columns:

train_labels = m.loc[:, train_dataset.columns]

But this results in:

421 107 310 233 173 15  134 230 438 97  ... 256 94  494 95  470 169 69      305 48  341

om_unspec   NaN NaN NaN NaN 10.0    9.59    NaN NaN NaN 10.0    ... NaN     10.0    NaN NaN NaN 10.0    10.0    NaN NaN NaN

1 rows × 415 columns

pandas

asked Jan 4 at 18:30

julieb

asked Jan 4 at 18:30

julieb

asked Jan 4 at 18:30

julieb

asked Jan 4 at 18:30

julieb

asked Jan 4 at 18:30

julieb

1

what is m ? can you show ?

– YOLO
Jan 4 at 18:34

SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25

I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03

column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13

l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26

|
show 2 more comments

1

what is m ? can you show ?

– YOLO
Jan 4 at 18:34

SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25

I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03

column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13

l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26

what is m ? can you show ?

– YOLO
Jan 4 at 18:34

SampleNbr 1 2 3 4 5 6 7 8 9 10 ... 12155 12156 12157 12158 12159 12165 12166 12167 12168 12169 om_unspec 10.0 8.24 10.0 6.78 10.0 8.54 10.0 10.0 10.0 10.0 ... 2.68 3.37 1.67 1.74 1.25 6.2 5.69 4.2 3.01 1.43 1 rows × 519 columns

– julieb
Jan 4 at 19:25

I'm having some trouble replicating. What is l and how is it different than m?

– Polkaguy6000
Jan 4 at 20:03

column_names = ['SampleNbr', 'om_unspec', 'A, 'B', C', 'D', 'E', 'F'] df = pd.read_excel(xlsx, usecols=column_names) df2 = df.loc[:, ['SampleNbr', 'om_unspec']] df2.set_index("SampleNbr", inplace=True) df.pop('om_unspec') m = df2[~df2.index.duplicated(keep='first')] m = m.transpose() l=[y.set_index('SampleNbr').stack().reset_index(drop=True) for x, y in df.groupby('SampleNbr')] l = pd.concat(l,axis=1) ... 138 rows × 519 columns

– julieb
Jan 4 at 20:13

l is the 'raw data' for the training set, m is the raw data for the training labels

– julieb
Jan 4 at 20:26

|
show 2 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044301%2fusing-loc-to-select-columns-results-in-some-nans-in-row-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk