PDFBox - Accessible PDF - How to check if PDF Tags have properties as per Accessiblity guidelines

Multi tool use
Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:
- H1 - validate that a H1 exists in the PDF
- Image(Figure Tag) - validate imagefigure has a Alt text
- Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated
- Tables - access table object and validate that table structure is proper (headers columns match with row column etc)
So far I was able to:
- Extract the Metadata and validate the document has proper Title, Subject and Producer info by
PDDocument.getDocumentInformation().getMetadataKeys();
- Validate if PDF is accessible or not by checking
PDDocument.getDocumentCatalog().getMarkInfo().isMarked();
flag
To access the Tags, I have tried these options:
getDocumentCatalog().getAcroForm()
returns Null
PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations();
returns Null- I tried looping through
PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids()
but its returning only 1StructElem
type object
Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).
Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro
java pdf accessibility pdfbox
add a comment |
Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:
- H1 - validate that a H1 exists in the PDF
- Image(Figure Tag) - validate imagefigure has a Alt text
- Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated
- Tables - access table object and validate that table structure is proper (headers columns match with row column etc)
So far I was able to:
- Extract the Metadata and validate the document has proper Title, Subject and Producer info by
PDDocument.getDocumentInformation().getMetadataKeys();
- Validate if PDF is accessible or not by checking
PDDocument.getDocumentCatalog().getMarkInfo().isMarked();
flag
To access the Tags, I have tried these options:
getDocumentCatalog().getAcroForm()
returns Null
PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations();
returns Null- I tried looping through
PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids()
but its returning only 1StructElem
type object
Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).
Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro
java pdf accessibility pdfbox
1
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago
add a comment |
Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:
- H1 - validate that a H1 exists in the PDF
- Image(Figure Tag) - validate imagefigure has a Alt text
- Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated
- Tables - access table object and validate that table structure is proper (headers columns match with row column etc)
So far I was able to:
- Extract the Metadata and validate the document has proper Title, Subject and Producer info by
PDDocument.getDocumentInformation().getMetadataKeys();
- Validate if PDF is accessible or not by checking
PDDocument.getDocumentCatalog().getMarkInfo().isMarked();
flag
To access the Tags, I have tried these options:
getDocumentCatalog().getAcroForm()
returns Null
PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations();
returns Null- I tried looping through
PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids()
but its returning only 1StructElem
type object
Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).
Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro
java pdf accessibility pdfbox
Need to check if PDF Tags have properties as per Accessibility guidelines.
Examples:
- H1 - validate that a H1 exists in the PDF
- Image(Figure Tag) - validate imagefigure has a Alt text
- Language - Validate that language property is set so that screen reader will read properly. For Spanish and English documents, respective Language codes should be updated
- Tables - access table object and validate that table structure is proper (headers columns match with row column etc)
So far I was able to:
- Extract the Metadata and validate the document has proper Title, Subject and Producer info by
PDDocument.getDocumentInformation().getMetadataKeys();
- Validate if PDF is accessible or not by checking
PDDocument.getDocumentCatalog().getMarkInfo().isMarked();
flag
To access the Tags, I have tried these options:
getDocumentCatalog().getAcroForm()
returns Null
PDDocument.getDocumentCatalog().getPages().get(0).getAnnotations();
returns Null- I tried looping through
PDDocument.getDocumentCatalog().getStructureTreeRoot().getKids()
but its returning only 1StructElem
type object
Creation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox.
I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).
Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties.
Note: Manual validation of these properties are performed using Adobe Acrobat Pro
java pdf accessibility pdfbox
java pdf accessibility pdfbox
asked Dec 27 '18 at 16:41
Sachin G
196
196
1
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago
add a comment |
1
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago
1
1
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948192%2fpdfbox-accessible-pdf-how-to-check-if-pdf-tags-have-properties-as-per-access%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948192%2fpdfbox-accessible-pdf-how-to-check-if-pdf-tags-have-properties-as-per-access%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
lz5x1cWcA40RL8q5TJhhvdYg6f SJygr732cLeSh6fqSnBi4LXFJEK,C K,X,F0P2qNeWy7dNgQcjfPjgjpcuFwL9Q,U
1
Please share a PDF that has these tags and ping me. What you are searching is in the structure tree, also look at it with PDFDebugger.
– Tilman Hausherr
2 days ago
@TilmanHausherr Thank you for the response. Unfortunately I cannot share the PDF which I have to validate. Here are the sample PDFs. In the below PDF, I need to validate that Fox and dog images have proper alt texts. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/… In the below PDF, I need to validate that Table structure is right. Like it has TH, TBody tags. Each TD is mapped. gitlab.itextsupport.com/itext7/samples/raw/develop/publications/…
– Sachin G
yesterday
I had a look… this is more tricky than I thought, sorry. If you'll look at it with PDFDebugger you'll see why :-(
– Tilman Hausherr
23 hours ago
No problem, Thank you for checking. I haven't used PDFDebugger before, will check.
– Sachin G
23 hours ago