HTML to PDF

Have you ever wanted a function to convert HTML to PDF? It is really easy. A good use of this is when you have a resume on your website and you want to create a download of it as a PDF. Mind you; you will have to properly format your resume so it looks good in both cases. Not the easiest job, but totally worth while to have a resume download created on demand. Continue reading HTML to PDF

Converting a PDF to Excel

Converting a PDF to Excel using InvestInTech.com’s PDF to Excel SDK.

PDF Sharp and other SDKs had the ability to read text from a PDF; however, I had found that InvestInTech’s PDF to Excel kept data in grid form.
This made it easier to use OLEDB to query the excel and strip the data as needed. I had tried InvestInTech’s XML conversion, but it did not have the same clean results.

Here is a sample of how I had accomplished the conversion.

String name = Path.GetFileNameWithoutExtension(filename);
String directory = @"C:temp";

String[] files = Directory.GetFiles(directory);

Int32 iCount = 0;
foreach (String file in files)
{
iCount++;
toolStripStatusLabel2.Text = " - Converting PDF to Excel File# " + iCount + " of " + files.Count();
Application.DoEvents();

CPDF2ExcelClass pdf2Excel = new CPDF2ExcelClass();
IPDF2Excel iPDF2Excel = pdf2Excel;

iPDF2Excel.PDF2Excel(file, file.Replace(".pdf", ".xls"));

toolStripStatusLabel2.Text = " - Converting PDF to Tiff File# " + iCount + " of " + files.Count();
Application.DoEvents();
ConvertPDFToTiff(file);

File.Delete(file);
}

How To Split A PDF Using PdfSharp

I recently completed a project which broker a PDF file into multiple files from which I converted to a MS Excel file and ultimately processed the data into a database.

This segment is dealing with the portion for splitting a multi-page Adobe PDF file into multiple pages. I mostly do this step, because we will be storing the individual page into a document management system along with the data that we strip from it. This page will be used later by our data entry clerks.

When breaking down to individual pages, we need to ensure that we keep the integrity of the original page. This ensures we can still convert to an Excel file to get the data from it.

There are many open-source and free PDF SDK kits that you can try. I had best luck doing most any PDF work using PdfSharp ( http://pdfsharp.com/PDFsharp/ ). Here is a modified code segment of how you can use PdfSharp to split.

Int32 iCount = 0;

PdfDocument inputDocument = PdfReader.Open(filename, PdfDocumentOpenMode.Import);
String directory = @"C:temp";
string name = Path.GetFileNameWithoutExtension(filename);
for (int idx = 0; idx < inputDocument.PageCount; idx++)
{
iCount++;
toolStripStatusLabel2.Text = " - Processing File# " + iCount + " of " + inputDocument.PageCount;
Application.DoEvents();

// Create new document
PdfDocument outputDocument = new PdfDocument();
outputDocument.Version = inputDocument.Version;
outputDocument.Info.Title = String.Format("Page {0} of {1}", idx + 1, inputDocument.Info.Title);
outputDocument.Info.Creator = inputDocument.Info.Creator;

// Add the page and save it
outputDocument.AddPage(inputDocument.Pages[idx]);
outputDocument.Save(Path.Combine(directory , String.Format("{0} - Page {1}.pdf", name, idx + 1)));
}