PDFBox - Accessible PDF - How to check if PDF Tags have properties as per Accessiblity guidelines

Multi tool use
Multi tool use












2














Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:




  • H1 - validate that a H1 exists in the PDF

  • Image(Figure Tag) - validate imagefigure has a Alt text

  • Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated

  • Tables - access table object and validate that table structure is proper (headers columns match with row column etc)


So far I was able to:




  • Extract the Metadata and validate the document has proper Title, Subject and Producer info by PDDocument.getDocumentInformation().getMetadataKeys();

  • Validate if PDF is accessible or not by checking PDDocument.getDocumentCatalog().getMarkInfo().isMarked(); flag


To access the Tags, I have tried these options:





  • getDocumentCatalog().getAcroForm() returns Null


  • PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations(); returns Null

  • I tried looping through PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids() but its returning only 1 StructElem type object


Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).



Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro










share|improve this question


















  • 1




    Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
    – Tilman Hausherr
    2 days ago










  • @TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
    – Sachin G
    yesterday










  • I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
    – Tilman Hausherr
    23 hours ago










  • No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
    – Sachin G
    23 hours ago
















2














Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:




  • H1 - validate that a H1 exists in the PDF

  • Image(Figure Tag) - validate imagefigure has a Alt text

  • Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated

  • Tables - access table object and validate that table structure is proper (headers columns match with row column etc)


So far I was able to:




  • Extract the Metadata and validate the document has proper Title, Subject and Producer info by PDDocument.getDocumentInformation().getMetadataKeys();

  • Validate if PDF is accessible or not by checking PDDocument.getDocumentCatalog().getMarkInfo().isMarked(); flag


To access the Tags, I have tried these options:





  • getDocumentCatalog().getAcroForm() returns Null


  • PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations(); returns Null

  • I tried looping through PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids() but its returning only 1 StructElem type object


Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).



Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro










share|improve this question


















  • 1




    Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
    – Tilman Hausherr
    2 days ago










  • @TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
    – Sachin G
    yesterday










  • I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
    – Tilman Hausherr
    23 hours ago










  • No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
    – Sachin G
    23 hours ago














2












2








2


1





Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:




  • H1 - validate that a H1 exists in the PDF

  • Image(Figure Tag) - validate imagefigure has a Alt text

  • Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated

  • Tables - access table object and validate that table structure is proper (headers columns match with row column etc)


So far I was able to:




  • Extract the Metadata and validate the document has proper Title, Subject and Producer info by PDDocument.getDocumentInformation().getMetadataKeys();

  • Validate if PDF is accessible or not by checking PDDocument.getDocumentCatalog().getMarkInfo().isMarked(); flag


To access the Tags, I have tried these options:





  • getDocumentCatalog().getAcroForm() returns Null


  • PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations(); returns Null

  • I tried looping through PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids() but its returning only 1 StructElem type object


Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).



Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro










share|improve this question













Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:




  • H1 - validate that a H1 exists in the PDF

  • Image(Figure Tag) - validate imagefigure has a Alt text

  • Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated

  • Tables - access table object and validate that table structure is proper (headers columns match with row column etc)


So far I was able to:




  • Extract the Metadata and validate the document has proper Title, Subject and Producer info by PDDocument.getDocumentInformation().getMetadataKeys();

  • Validate if PDF is accessible or not by checking PDDocument.getDocumentCatalog().getMarkInfo().isMarked(); flag


To access the Tags, I have tried these options:





  • getDocumentCatalog().getAcroForm() returns Null


  • PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations(); returns Null

  • I tried looping through PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids() but its returning only 1 StructElem type object


Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).



Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro







java pdf accessibility pdfbox






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 27 '18 at 16:41









Sachin G

196




196








  • 1




    Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
    – Tilman Hausherr
    2 days ago










  • @TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
    – Sachin G
    yesterday










  • I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
    – Tilman Hausherr
    23 hours ago










  • No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
    – Sachin G
    23 hours ago














  • 1




    Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
    – Tilman Hausherr
    2 days ago










  • @TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
    – Sachin G
    yesterday










  • I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
    – Tilman Hausherr
    23 hours ago










  • No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
    – Sachin G
    23 hours ago








1




1




Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago




Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago












@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday




@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday












I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago




I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago












No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago




No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948192%2fpdfbox-accessible-pdf-how-to-check-if-pdf-tags-have-properties-as-per-access%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948192%2fpdfbox-accessible-pdf-how-to-check-if-pdf-tags-have-properties-as-per-access%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







lz5x1cWcA40RL8q5TJhhvdYg6f SJygr732cLeSh6fqSnBi4LXFJEK,C K,X,F0P2qNeWy7dNgQcjfPjgjpcuFwL9Q,U
TOJI ZXII,8cgHTl19QWBrQd

Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas