Accessibility Minute - December 2020

Choosing Accessible PDFs

PDFs are everywhere. However, not all PDFs are accessible; it depends on how the PDFs are created and modified. Creating fully accessible PDFs is a complex process that requires more training than can fit in a short newsletter. As a starting point, we’ll help you learn to recognize a few key requirements of accessible PDFs.

How Do I Recognize an Accessible PDF?

The first principle of PDF accessibility is that image-based PDFs are inaccessible. Assistive technology requires digital text to interact with, and images do not contain digital text.

You can tell if a PDF is an image by trying to click or select text in the PDF. If the entire page is highlighted when you click on it instead of specific text, the PDF is an image. However, if you are able to select a specific piece of text within the PDF, then the PDF contains selectable digital text. Selectable text is a key feature to check in a PDF when you’re evaluating it for accessibility; it refers to text that can be highlighted, copied, and interacted with. Documents should have selectable text in order to be read with assistive technology.

PDFs created from scanned documents often contain additional accessibility barriers. Scanned images usually have poor image quality that can make the content difficult to read. They are also often marked with handwritten annotations, highlights, and underlining that can't be recognized by assistive technology and may prevent the technology from accessing the underlying text.

Instead of image-based PDFs, try to use PDFs that were created digitally, such as a PDF that was created by exporting a Word document or a PowerPoint file to PDF. These PDFs may not be totally accessible—there are a lot more steps you’ll need to learn before you can be confident that a PDF is completely accessible—but PDFs created from digital documents will generally be more accessible than PDFs created from scanned images. For academic content, Norlin OneSearch can often help you find PDFs that were created in a digital format.

What Should I Do with an Image-Based PDF?

If you can’t find an alternative text-based version of an image-based PDF, you can still improve the accessibility of the document. If you're scanning the document yourself, erase any handwritten notations first to improve the scan quality.

If the PDF doesn’t have selectable text, you can run it through SensusAccess. SensusAccess is a service provided by CU that can conduct optical character recognition (OCR) to turn images of text into digital text. To use OCR on your document, select “Accessibility conversion” in Step 3, then select “pdf - Tagged PDF” as your target format in Step 4.

Please note that the quality of the results from SensusAccess depend on the quality of the original document, and the resulting content may contain errors. You can use Adobe Acrobat Pro to fix any inaccurate text after it has been converted; the software is free for staff and faculty at CU Boulder.

What Next?

A fully accessible PDF must also have properly tagged content and a variety of other features that we don’t have space to explore here. For now, just focus on avoiding image-based PDFs and using SensusAccess on PDFs that don’t have selectable text. Expect more information about PDFs in a future newsletter!

December Challenges

Try reviewing one PDF for accessibility using the tests above. If the PDF is a scanned image, try to find a version that was created from a digital file. If you can't find an alternate version with selectable text, try running the PDF through SensusAccess.