Convert PDF to CSV using Tabula


Okay, another way to convert PDF to CSV is using Tabula in Python.

Tabula is free and simple to use with Python. They have very good documentation on their website (https://tabula-py.readthedocs.io/en/latest/index.html).

The requirement for Tabula is simple, just Java and Python. Well, they said it will need Python 3.7 but it can run with 3.6.

Installation is easy as usual. In Linux Mint, just type

pip install tabula-py

It is a bit more step for Windows. Just read it here.

Same with PDFTables API, basic code to convert from PDF to CSV is very simple. Here I give you the sample code.

# Import the required Module
import tabula

# Read a PDF File
df = tabula.read_pdf("your_pdf.pdf", pages='all')[0]
# convert PDF into CSV
tabula.convert_into("your_pdf.pdf", "your_csv_result.csv", output_format="csv", pages='all')
print(df)

The important thing is, this is free without limitations. I try to convert a file with more than 100 pages (around 1,000 records) and it done faster than I expect.

Sample code, PDF, and output can check at https://github.com/syahmins/PDFtoXLS