Read PDF content using Selenium WebDriver

While doing automation with selenium, you may encounter a scenario where you have to read and verify the PDF content.
In this post, we will the see the simplest way to read and verify the PDF content.

Read_PDF

Pre-requisite

1. Download Apache PDFBox JAR from here. As we will be using this API to read the PDF content.
PDFbox-JAR
2. Add Selenium Standalone JAR and PDFBox JAR into the Build path of your JAVA Project.

Scenario :

1. Launch Chrome Browser and Open URL : http://www.pdf995.com/samples/pdf.pdf
2. Read PDF Content and store it into a String variable.
3. Verify the content.
4. Close Browser.

Sample Script :

import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.Assert;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;

public class PDFReader {

	WebDriver driver;

	@BeforeTest
	public void setUp() {
		System.setProperty("webdriver.chrome.driver", "C:\\gridsetup\\chromedriver.exe");
		driver = new ChromeDriver();
	}

	@Test
	public void verifyPDFContent() throws Exception {
		String url = "http://www.pdf995.com/samples/pdf.pdf";
		// Launch Chrome Browser and Open URL
		driver.get(url);
		// Read PDF Content and store it into a String variable.
		String pdfContent = readPDFContent(driver.getCurrentUrl());
		// Verify the content.
		Assert.assertTrue(
				pdfContent.contains("Pdf995 makes it easy and affordable to create professional-quality documents"));
		// Close Browser.
		driver.quit();
	}

	public String readPDFContent(String appUrl) throws Exception {
		URL url = new URL(appUrl);
		InputStream is = url.openStream();
		BufferedInputStream fileToParse = new BufferedInputStream(is);
		PDDocument document = null;
		String output = null;
		try {
			document = PDDocument.load(fileToParse);
			output = new PDFTextStripper().getText(document);
			System.out.println(output);
		} finally {
			if (document != null) {
				document.close();
			}
			fileToParse.close();
			is.close();
		}
		return output;
	}

}

If you really like the information provided above, please don’t forget to like us on Facebook, you can also leave the comment.

Leave a Reply

Your email address will not be published. Required fields are marked *