Kim's BotSpot

I'm a faculty member of the Botany Department at the University of Hawai`i at Manoa. My academic specialty is quantitative ecology. Current activities are dominated by research in ethnobotany and examining how Internet-based activities can enhance research and teaching.

Sunday, July 09, 2006

PDF support in PHP

Using PHP to index a PHP Library

A possible PHP hack is reported on this web site: http://www.pdfhacks.com/pdfportal/

This comes from a book "PDF Hacks."

PDF Metadata

Should we use metadata in PDF files?

At first glance, it looks like well-structured metadata (e.g., XML) may be the way to add bibliographic citation information to PDF files.

There is a program that appears to support PDF metadata: Advanced PDF Tools.

Here is their website: http://www.verypdf.com/pdfinfoeditor/pdf-metadata.html

There are two relevant versions of the software, one that has a WYSIWYG interface and the other as a command-line entry. The website descriptions indicate that this software might do what we need. However, in the short time I tried the program I could not find out how to get the metadata out of the PDF file.

Clearly, more time needs to be spent on this software.

Improving PDF support

PDF files are becoming an important component in our document download and storage universe. But there are problems. For example, Adobe Acrobat is slow at opening PDF files. This software is also not very easy to use when filling out forms.

There are additional capabilities needed to organize libraries of PDF documents. For example, we need to be able to turn out lists of documents stored in our PDF libraries.

The following notes document a few recent discoveries.

Foxit Pro ($35) is a very fast PDF reader. You can use it as stand-alone and to support reading downloaded PDF files. There is a "typewriter" option that lets you enter text on a PDF document without having to create a form field; this is handy if you are filling out a one-time form that didn't come with PDF form fields. The annotation features available in Adobe Acrobat are in Foxit Pro, so you won't be missing those.

Adobe Acrobat Pro does have the advantage of doing OCR on PDF documents. While we have not been considering doing OCR on everything, it might be time to reconsider this possibility. Foxit Pro comes with a program, Foxit Library, that will index all of the text in all of your PDF library documents. You can look down the list of words, click on one, and find the documents and frequency of occurrence of the selected word.

A-PDF Info Changer is a free program that lets you read and update the basic header fields in a PDF file. This could be useful for entering document identification information. The field names don't correspond very closely to bibliographic citations. That is a problem that needs to be faced.