myown1.com

Linux tips & tricks.

Filling out pdf forms



How to fill out PDF forms in Linux (batch mode).

Adobe Portable Document Format (PDF) documents can be created with blank spaces that can be filled in by the user. The US Internal Revenue Service is a popular source of forms using this feature. It's very convenient if you want to use the Adobe Reader and fill out the forms interactively. It's a bit less convenient if you want to do calculations in a spreadsheet and then transfer the numbers to the form. One of the reasons I own a computer is to let it do grunt work like copying numbers from one place to another, so I decided to figure out how to fill out PDF forms automatically. The tools exist but it takes a while to figure out how to use them, so I decided to create this tutorial to capture what I've found out.
data flow for inserting data into a pdf form

Here's the process flow I use. Starting from the right, the pdftk program takes specially formatted values (the fdf file), merges them with an existing pdf form, and writes a new pdf file which is the filled-out form. On the left, I wrote a program fdf_gen, which handles the process of getting my data into the right format. It's fairly specific to my problem, but may be useful as a starting point for someone else.

Outline

What you need.

As an example, I'm going to prepare the 2007 income tax return for one of our cats. So, the forms and tools we need are:

Filling the form

The command to create a filled-out form is
pdftk f1040.pdf fill_form f1040_kat.fdf output f1040_kat.pdf
where f1040_kat.fdf defines the contents of each field, and f1040_kat.pdf is the new pdf with the values inserted. The fdf file contains specially formatted PostScript, and looks like this:
%FDF-1.2
1 0 obj<</FDF<< /Fields[
<</T(f1_04(0))/V(Katherine(Kat))>>
<</T(f1_05\(0\))/V(Astrofic)>>
<</T(c1_03(0))/V(a)>>
% And lots more lines like these.
] >> >>
endobj
trailer
<</Root 1 0 R>>
%%EOF
The top 2 lines and bottom 5 lines are standard headers and trailers, which should never need to be changed. On each line, the characters in parentheses after the T are the names of fields in the form, and the value in parentheses after the V is the value to be written into the field. For example, f1_04(0) is the "First Name" field. form 1040 name field, filled out

Finding the fields.

We find out what fields are in the form using the command
pdftk f1040.pdf dump_data_fields >f1040_fields.txt
The important parts of the output file are the lines:
FieldType: Text
FieldName: f1_01(0)
Most fields are of type "Text", we'll talk about FieldType "Button" next. The FieldName is just a character string that labels the field. Unfortunately, there's no relationship between these names and the line numbers or anything else on the form, so the only good way to figure out what's what is to stuff a dummy value into the field, and see where it shows up on the form.

Filling checkboxes

Checkboxes are handled by a Button FieldType. Here's one of the more complex examples from the 2007 f1040.pdf.
FieldType: Button
FieldName: c1_03(0)
FieldFlags: 0
FieldJustification: Left
FieldStateOption: Off
FieldStateOption: Yes
FieldStateOption: a
FieldStateOption: b
FieldStateOption: c
FieldStateOption: d
The FieldStateOption lines define the allowed values for the checkboxes. Most just have options Off (no boxes checked) or Yes (check the box). In this case, there are 5 possible choices. Naturally, the option value have absolutely no relationship to anything actually printed on the form, so we have to try the values until we get the one we want. And here it is.
form 1040 filing status boxes, filled out

Some hints on usage

These notes describe what I've seen on IRS forms; others may have other quirks.

Converting data to fdf format.

I wrote program fdf_gen.c to implement part of the process of creating an fdf file. It works on some simple test cases, but hasn't had any extensive validation. In other words, if you're going to use it for something critical like real tax forms, you really need to doublecheck the output to make sure it's doing what you want it to do.

In this case, I generate the fdf file using the command
fdf_gen f1040.flds kat.in kat.fdf
where f1040.flds just assigns a content type and more descriptive name to each value to be entered, and kat.in contains the input values. Typical entries in f1040.flds are:

string       LblLastName  f1_05(0)
string3      LblSSN       f1_06(0) f1_07(0) f1_08(0)
dollar_cents L7           f1_44(0) f1_45(0)
where the first item is the type of data, the next item is my descriptive name, and the rest of the line contains the field or fields the value will be written into.

File kat.in Just contains descriptive names and values:

LblLastName Astrofic
LblSSN -123-45-feed
L7 77.25
My data types are:

Screenshot of a completed form.

Sample income tax return (for our two cats)

Downloads

GNU GPL license badge
File fdf_gen_20080304.tgz is a gzip'ed tar file, which contains the program and the sample field values used to create the dummy tax return I developed here. I don't include the f1040.pdf: I think the IRS has enough bandwidth to supply that without my help. The program is licensed under the GNU GPL. You may have to right-click the link and select "Save as ..." or something like that to prevent the file from displaying as random characters in your browser.

This program was originally published in 2008. As of March 2012, Greg Lawson is also working on this code as part of an open source tax project. You may want to check his repository for more recent updates. See the links below.

  1. A famous slogan during the American Revolution was "No taxation without representation". We won that war so now we have taxation with representation, and the United States Internal Revenue Service is a popular agency that enjoys the enthusiastic support of all Americans. Or maybe not. Anyway, their site has lots of pdf files available for download.
  2. The author of the pdftk program has a website at www.accesspdf.com. It has the program, mailing lists, and links to purchase a book, PDF Hacks. I haven't purchased the book but the program is great, so I assume the book will be too.
  3. Greg Lawson is working on a tool for open source tax form processing. His website is at github.com/GregLawson/Open-Table-Explorer/wiki/6.1-Taxes and the repository for his updates to fdf_gen is at github.com/GregLawson/fdf_gen

home home
Last modified $Date: 2012/03/19 14:58:21 $