Pandas read_csv() conditionally skipping header row












1















I'm trying to read a csv file but my csv files differ. Some have different format and some have other. I'm trying to add controls so that I will not need to edit my code or my input file.



My problem is, some of these csv files have a line of String above the column headers. An example:



Created on 12-11-2018,CryptoDataDownload.com
Date,Symbol,Open,High,Low,Close,Volume From,Volume To
2018-12-11 11-AM,ADABTC,8.6e-06,8.61e-06,8.55e-06,8.57e-06,301141.7,2.59
2018-12-11 10-AM,ADABTC,8.69e-06,8.72e-06,8.6e-06,8.6e-06,236949.63,2.05


If I import this, the delimeter will use the first line and separate the file into two columns as Created on 12-11-2018 and CryptoDataDownload.com.



This is how df.head() looks like:



                        Created on 12-11-2018 CryptoDataDownload.com
Date Symbol Open High Low Close Volume From Volume To
2018-12-11 11-AM ADABTC 8.6e-06 8.61e-06 8.55e-06 8.57e-06 301141.7 2.59
2018-12-11 10-AM ADABTC 8.69e-06 8.72e-06 8.6e-06 8.6e-06 236949.63 2.05
2018-12-11 09-AM ADABTC 8.7e-06 8.7e-06 8.62e-06 8.69e-06 509311.39 4.41
2018-12-11 08-AM ADABTC 8.69e-06 8.7e-06 8.63e-06 8.7e-06 111367.34 0.9656


I want to check if this file has this line and skip it if so.



How can I do this?










share|improve this question




















  • 1





    So, did you try adding skiprows=1 to read_csv?

    – coldspeed
    Dec 12 '18 at 8:51











  • @coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

    – iso_9001_
    Dec 12 '18 at 8:53











  • Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

    – coldspeed
    Dec 12 '18 at 8:56













  • Headers are different too but I edit them to be the same after import

    – iso_9001_
    Dec 12 '18 at 8:59











  • I recommend moving the break to one indent level up, since it seems you only need to check the first line.

    – coldspeed
    Dec 12 '18 at 13:15
















1















I'm trying to read a csv file but my csv files differ. Some have different format and some have other. I'm trying to add controls so that I will not need to edit my code or my input file.



My problem is, some of these csv files have a line of String above the column headers. An example:



Created on 12-11-2018,CryptoDataDownload.com
Date,Symbol,Open,High,Low,Close,Volume From,Volume To
2018-12-11 11-AM,ADABTC,8.6e-06,8.61e-06,8.55e-06,8.57e-06,301141.7,2.59
2018-12-11 10-AM,ADABTC,8.69e-06,8.72e-06,8.6e-06,8.6e-06,236949.63,2.05


If I import this, the delimeter will use the first line and separate the file into two columns as Created on 12-11-2018 and CryptoDataDownload.com.



This is how df.head() looks like:



                        Created on 12-11-2018 CryptoDataDownload.com
Date Symbol Open High Low Close Volume From Volume To
2018-12-11 11-AM ADABTC 8.6e-06 8.61e-06 8.55e-06 8.57e-06 301141.7 2.59
2018-12-11 10-AM ADABTC 8.69e-06 8.72e-06 8.6e-06 8.6e-06 236949.63 2.05
2018-12-11 09-AM ADABTC 8.7e-06 8.7e-06 8.62e-06 8.69e-06 509311.39 4.41
2018-12-11 08-AM ADABTC 8.69e-06 8.7e-06 8.63e-06 8.7e-06 111367.34 0.9656


I want to check if this file has this line and skip it if so.



How can I do this?










share|improve this question




















  • 1





    So, did you try adding skiprows=1 to read_csv?

    – coldspeed
    Dec 12 '18 at 8:51











  • @coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

    – iso_9001_
    Dec 12 '18 at 8:53











  • Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

    – coldspeed
    Dec 12 '18 at 8:56













  • Headers are different too but I edit them to be the same after import

    – iso_9001_
    Dec 12 '18 at 8:59











  • I recommend moving the break to one indent level up, since it seems you only need to check the first line.

    – coldspeed
    Dec 12 '18 at 13:15














1












1








1








I'm trying to read a csv file but my csv files differ. Some have different format and some have other. I'm trying to add controls so that I will not need to edit my code or my input file.



My problem is, some of these csv files have a line of String above the column headers. An example:



Created on 12-11-2018,CryptoDataDownload.com
Date,Symbol,Open,High,Low,Close,Volume From,Volume To
2018-12-11 11-AM,ADABTC,8.6e-06,8.61e-06,8.55e-06,8.57e-06,301141.7,2.59
2018-12-11 10-AM,ADABTC,8.69e-06,8.72e-06,8.6e-06,8.6e-06,236949.63,2.05


If I import this, the delimeter will use the first line and separate the file into two columns as Created on 12-11-2018 and CryptoDataDownload.com.



This is how df.head() looks like:



                        Created on 12-11-2018 CryptoDataDownload.com
Date Symbol Open High Low Close Volume From Volume To
2018-12-11 11-AM ADABTC 8.6e-06 8.61e-06 8.55e-06 8.57e-06 301141.7 2.59
2018-12-11 10-AM ADABTC 8.69e-06 8.72e-06 8.6e-06 8.6e-06 236949.63 2.05
2018-12-11 09-AM ADABTC 8.7e-06 8.7e-06 8.62e-06 8.69e-06 509311.39 4.41
2018-12-11 08-AM ADABTC 8.69e-06 8.7e-06 8.63e-06 8.7e-06 111367.34 0.9656


I want to check if this file has this line and skip it if so.



How can I do this?










share|improve this question
















I'm trying to read a csv file but my csv files differ. Some have different format and some have other. I'm trying to add controls so that I will not need to edit my code or my input file.



My problem is, some of these csv files have a line of String above the column headers. An example:



Created on 12-11-2018,CryptoDataDownload.com
Date,Symbol,Open,High,Low,Close,Volume From,Volume To
2018-12-11 11-AM,ADABTC,8.6e-06,8.61e-06,8.55e-06,8.57e-06,301141.7,2.59
2018-12-11 10-AM,ADABTC,8.69e-06,8.72e-06,8.6e-06,8.6e-06,236949.63,2.05


If I import this, the delimeter will use the first line and separate the file into two columns as Created on 12-11-2018 and CryptoDataDownload.com.



This is how df.head() looks like:



                        Created on 12-11-2018 CryptoDataDownload.com
Date Symbol Open High Low Close Volume From Volume To
2018-12-11 11-AM ADABTC 8.6e-06 8.61e-06 8.55e-06 8.57e-06 301141.7 2.59
2018-12-11 10-AM ADABTC 8.69e-06 8.72e-06 8.6e-06 8.6e-06 236949.63 2.05
2018-12-11 09-AM ADABTC 8.7e-06 8.7e-06 8.62e-06 8.69e-06 509311.39 4.41
2018-12-11 08-AM ADABTC 8.69e-06 8.7e-06 8.63e-06 8.7e-06 111367.34 0.9656


I want to check if this file has this line and skip it if so.



How can I do this?







python pandas csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 12 '18 at 13:37









coldspeed

135k23145230




135k23145230










asked Dec 12 '18 at 8:49









iso_9001_iso_9001_

59621128




59621128








  • 1





    So, did you try adding skiprows=1 to read_csv?

    – coldspeed
    Dec 12 '18 at 8:51











  • @coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

    – iso_9001_
    Dec 12 '18 at 8:53











  • Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

    – coldspeed
    Dec 12 '18 at 8:56













  • Headers are different too but I edit them to be the same after import

    – iso_9001_
    Dec 12 '18 at 8:59











  • I recommend moving the break to one indent level up, since it seems you only need to check the first line.

    – coldspeed
    Dec 12 '18 at 13:15














  • 1





    So, did you try adding skiprows=1 to read_csv?

    – coldspeed
    Dec 12 '18 at 8:51











  • @coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

    – iso_9001_
    Dec 12 '18 at 8:53











  • Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

    – coldspeed
    Dec 12 '18 at 8:56













  • Headers are different too but I edit them to be the same after import

    – iso_9001_
    Dec 12 '18 at 8:59











  • I recommend moving the break to one indent level up, since it seems you only need to check the first line.

    – coldspeed
    Dec 12 '18 at 13:15








1




1





So, did you try adding skiprows=1 to read_csv?

– coldspeed
Dec 12 '18 at 8:51





So, did you try adding skiprows=1 to read_csv?

– coldspeed
Dec 12 '18 at 8:51













@coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

– iso_9001_
Dec 12 '18 at 8:53





@coldspeed Not all my files have this line so I need to first check if this line exists. Otherwise, I will delete my column headers

– iso_9001_
Dec 12 '18 at 8:53













Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

– coldspeed
Dec 12 '18 at 8:56







Do all of your dataframes have the same header, or could it be different? Is there a pattern associated with these headers (for example, "created on...")?

– coldspeed
Dec 12 '18 at 8:56















Headers are different too but I edit them to be the same after import

– iso_9001_
Dec 12 '18 at 8:59





Headers are different too but I edit them to be the same after import

– iso_9001_
Dec 12 '18 at 8:59













I recommend moving the break to one indent level up, since it seems you only need to check the first line.

– coldspeed
Dec 12 '18 at 13:15





I recommend moving the break to one indent level up, since it seems you only need to check the first line.

– coldspeed
Dec 12 '18 at 13:15












2 Answers
2






active

oldest

votes


















3














If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.



filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)




Good pratice would be to use a context manager, so you could also do this:



filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)





share|improve this answer


























  • Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

    – iso_9001_
    Dec 12 '18 at 10:51











  • I edited my post.

    – iso_9001_
    Dec 12 '18 at 11:19











  • I managed to make it work with a little tweak. Thank you.

    – iso_9001_
    Dec 12 '18 at 12:28











  • @iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

    – coldspeed
    Dec 12 '18 at 13:14











  • You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

    – iso_9001_
    Dec 12 '18 at 13:57



















0














You can skip rows which start with specific character while using 'comment' argument in pandas read_csv command. In your case you can skip the lines which starts with "C" using the following code:



filename = '/path/to/file.csv'
pd.read_csv(filename, comment = "C")





share|improve this answer


























  • Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

    – iso_9001_
    Dec 12 '18 at 11:04






  • 1





    AFAIK comment parameter must be a single character so a whole word will not work.

    – iso_9001_
    Dec 31 '18 at 22:18











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53739172%2fpandas-read-csv-conditionally-skipping-header-row%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.



filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)




Good pratice would be to use a context manager, so you could also do this:



filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)





share|improve this answer


























  • Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

    – iso_9001_
    Dec 12 '18 at 10:51











  • I edited my post.

    – iso_9001_
    Dec 12 '18 at 11:19











  • I managed to make it work with a little tweak. Thank you.

    – iso_9001_
    Dec 12 '18 at 12:28











  • @iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

    – coldspeed
    Dec 12 '18 at 13:14











  • You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

    – iso_9001_
    Dec 12 '18 at 13:57
















3














If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.



filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)




Good pratice would be to use a context manager, so you could also do this:



filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)





share|improve this answer


























  • Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

    – iso_9001_
    Dec 12 '18 at 10:51











  • I edited my post.

    – iso_9001_
    Dec 12 '18 at 11:19











  • I managed to make it work with a little tweak. Thank you.

    – iso_9001_
    Dec 12 '18 at 12:28











  • @iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

    – coldspeed
    Dec 12 '18 at 13:14











  • You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

    – iso_9001_
    Dec 12 '18 at 13:57














3












3








3







If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.



filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)




Good pratice would be to use a context manager, so you could also do this:



filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)





share|improve this answer















If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.



filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)




Good pratice would be to use a context manager, so you could also do this:



filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 12 '18 at 14:17

























answered Dec 12 '18 at 8:59









coldspeedcoldspeed

135k23145230




135k23145230













  • Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

    – iso_9001_
    Dec 12 '18 at 10:51











  • I edited my post.

    – iso_9001_
    Dec 12 '18 at 11:19











  • I managed to make it work with a little tweak. Thank you.

    – iso_9001_
    Dec 12 '18 at 12:28











  • @iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

    – coldspeed
    Dec 12 '18 at 13:14











  • You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

    – iso_9001_
    Dec 12 '18 at 13:57



















  • Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

    – iso_9001_
    Dec 12 '18 at 10:51











  • I edited my post.

    – iso_9001_
    Dec 12 '18 at 11:19











  • I managed to make it work with a little tweak. Thank you.

    – iso_9001_
    Dec 12 '18 at 12:28











  • @iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

    – coldspeed
    Dec 12 '18 at 13:14











  • You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

    – iso_9001_
    Dec 12 '18 at 13:57

















Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

– iso_9001_
Dec 12 '18 at 10:51





Giving me TypeError: 'numpy.ndarray' object is not callable error. Is this next(f) 0 indexed?

– iso_9001_
Dec 12 '18 at 10:51













I edited my post.

– iso_9001_
Dec 12 '18 at 11:19





I edited my post.

– iso_9001_
Dec 12 '18 at 11:19













I managed to make it work with a little tweak. Thank you.

– iso_9001_
Dec 12 '18 at 12:28





I managed to make it work with a little tweak. Thank you.

– iso_9001_
Dec 12 '18 at 12:28













@iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

– coldspeed
Dec 12 '18 at 13:14





@iso_9001_ that was a weird error to run into from my code. Were you able to figure out the bug? Feel free to edit my answer. Thanks!

– coldspeed
Dec 12 '18 at 13:14













You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

– iso_9001_
Dec 12 '18 at 13:57





You might have deleted my working code while editing. I think your code doesn't work because startswith function returns true or false, not index.

– iso_9001_
Dec 12 '18 at 13:57













0














You can skip rows which start with specific character while using 'comment' argument in pandas read_csv command. In your case you can skip the lines which starts with "C" using the following code:



filename = '/path/to/file.csv'
pd.read_csv(filename, comment = "C")





share|improve this answer


























  • Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

    – iso_9001_
    Dec 12 '18 at 11:04






  • 1





    AFAIK comment parameter must be a single character so a whole word will not work.

    – iso_9001_
    Dec 31 '18 at 22:18
















0














You can skip rows which start with specific character while using 'comment' argument in pandas read_csv command. In your case you can skip the lines which starts with "C" using the following code:



filename = '/path/to/file.csv'
pd.read_csv(filename, comment = "C")





share|improve this answer


























  • Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

    – iso_9001_
    Dec 12 '18 at 11:04






  • 1





    AFAIK comment parameter must be a single character so a whole word will not work.

    – iso_9001_
    Dec 31 '18 at 22:18














0












0








0







You can skip rows which start with specific character while using 'comment' argument in pandas read_csv command. In your case you can skip the lines which starts with "C" using the following code:



filename = '/path/to/file.csv'
pd.read_csv(filename, comment = "C")





share|improve this answer















You can skip rows which start with specific character while using 'comment' argument in pandas read_csv command. In your case you can skip the lines which starts with "C" using the following code:



filename = '/path/to/file.csv'
pd.read_csv(filename, comment = "C")






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 6:03

























answered Dec 12 '18 at 9:03









Ernest S KirubakaranErnest S Kirubakaran

91159




91159













  • Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

    – iso_9001_
    Dec 12 '18 at 11:04






  • 1





    AFAIK comment parameter must be a single character so a whole word will not work.

    – iso_9001_
    Dec 31 '18 at 22:18



















  • Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

    – iso_9001_
    Dec 12 '18 at 11:04






  • 1





    AFAIK comment parameter must be a single character so a whole word will not work.

    – iso_9001_
    Dec 31 '18 at 22:18

















Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

– iso_9001_
Dec 12 '18 at 11:04





Doesn't work correctly. I have 8 columns after that line but this creates a df of 5 columns because of a column name starting with 'C' (Close).

– iso_9001_
Dec 12 '18 at 11:04




1




1





AFAIK comment parameter must be a single character so a whole word will not work.

– iso_9001_
Dec 31 '18 at 22:18





AFAIK comment parameter must be a single character so a whole word will not work.

– iso_9001_
Dec 31 '18 at 22:18


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53739172%2fpandas-read-csv-conditionally-skipping-header-row%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mossoró

Error while reading .h5 file using the rhdf5 package in R

Pushsharp Apns notification error: 'InvalidToken'