Generating PDFs Using Python With Google Sheets As The Source

This stemmed from a project and my approach to the solution. The goal of the project was to create 500 PDFs and email those PDFs out using the Google Suite of products.

I have alot of experience with Google Apps Scripts and have generated PDFs before. However, it is always slightly challenging due to rate limits. As a result, I decided to use Google Colab to create the PDFs. The results are incredibly impressive compared to Google Apps Script. In addition to generating PDFs, I’ll also show how to read in data from a Google Sheets file.

You can read about the email solution to this here.

Google Colab Notebook

  • Since Colab is a virtual environment, libraries and permissions need to be installed/granted each session. From my experience, installing libraries is fast.

Importing Libraries

#import libraries

import gspread
import pandas as pd
from google.colab import auth
from google.auth import default

from google.colab import drive
import pandas as pd
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

from fpdf import FPDF
from datetime import date, datetime
import os

#Credtionals
auth.authenticate_user()
creds, _ = default()

gc = gspread.authorize(creds)

Reading Data From Google Sheets & Sample DataFrame


# Open worksheets
sh = gc.open_by_url('Replace with Google Sheets URL')
worksheet_stock_list = sh.worksheet("Stock List")
worksheet_ohlc = sh.worksheet("OHLC")

# Read data into Pandas DataFrames
data_stock_list = worksheet_stock_list.get_all_values()
df_stock_list = pd.DataFrame(data_stock_list[1:], columns=data_stock_list[0])

data_ohlc = worksheet_ohlc.get_all_values()
df_ohlc = pd.DataFrame(data_ohlc[1:], columns=data_ohlc[0])

# Display DataFrames I
df_stock_list.head()

Link to notebook: GitHub

Creating Many PDFs And Saving To Google Drive

You’ll need to view the notebook for the full code, but here is a portion of the key code

# Mount Google Drive
drive.mount('/content/drive')

def run_process_for_symbol(symbol, df_stock_list, df_ohlc):
    today = date.today().strftime("%Y-%m-%d")

    # Filter data for the specified symbol (Stock List) and transpose it
    # Filter data for the specified symbol (Stock List) and transpose it
    stock_list_data = df_stock_list[df_stock_list['Symbol'] == symbol].values.tolist()


    # Create a PDF document
    pdf = PDF(symbol, today)
    pdf.add_page()

    # Stock List table (transposed) with adjusted column width and no borders
    pdf.chapter_title('Stock List', no_border=True)
    column_widths_stock_list = [60, 90]  # Adjust these widths based on your data
    pdf.chapter_body([df_stock_list.columns.tolist()] + stock_list_data, col_widths=column_widths_stock_list, no_border=True, transpose=True)

    # OHLC table with the correct order of columns, transposed, and the first column dropped
    pdf.chapter_title('OHLC')
    ohlc_data = df_ohlc[df_ohlc['Symbol'] == symbol][['Date', 'Open', 'High', 'Low', 'Close']].values.tolist()
    ohlc_column_widths = [25, 35, 25, 25, 25]  # Adjusted column widths for OHLC
    pdf.chapter_body([['Date', 'Open', 'High', 'Low', 'Close']] + ohlc_data, col_widths=ohlc_column_widths, no_border=False, is_ohlc=True)

    # Save PDF to folder
    output_folder = 'Foler Path'

    # Create the folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)

    pdf_path = os.path.join(output_folder, f'{symbol}_Summary.pdf')
    pdf.output(pdf_path)

    return pdf_path

def run_process_for_all_symbols(df_stock_list, df_ohlc):
    start_time = datetime.now()

    # Get unique symbols from the stock list
    symbols = df_stock_list['Symbol'].unique()

    # Run the process for each symbol
    for symbol in symbols:
        try:
            pdf_path = run_process_for_symbol(symbol, df_stock_list, df_ohlc)
            print(f"PDF generated and saved at: {pdf_path}")
        except UnicodeEncodeError as e:
            print(f"Error generating PDF for symbol {symbol}: {e}")
            continue

    end_time = datetime.now()
    elapsed_time = end_time - start_time
    print(f"Total elapsed time: {elapsed_time}")

# Example usage
run_process_for_all_symbols(df_stock_list, df_ohlc)

This portion of the code created 500 PDFs in 5 seconds. This is significantly better than Google Apps Scripts in terms of operational efficiency, but requires more upfront work. This project will be continued with another post here:

Sharing

Related Articles

  • All Post
  • Articles
  • Blog Post
  • General Business Automation
  • Portfolio
  • Stock Market & Finance