# VBForums CodeBank > CodeBank - Visual Basic .NET >  VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

## stanav

Hello all, 
I was recently working on a job assignment dealing with pdf files. My company produces hundreds of daily reports in pdf format where each report is for a specific division/sub-company. Some top executives want to look at only a single report that contains all divisions/sub-companies instead of looking at each one seperately, so my job is to merge those reports together into a single pdf file with bookmarks for easy navigation. Originally, I had used Acrobat COM object approach but the management didn't want to spend $ to buy a full version of Adobe Acrobat for every PC that runs my program, so I had to rewrite without relying on Acrobat. I then found the open source PDFBox package which can be downloaded here... Once you had the package downloaded and unzipped to a directory in your local machine, you need to add the following references to your project:


```
IKVM.GNU.Classpath
IKVM.Runtime
PDFBox-0.7.3
```

To make the story short, here are the steps I did:
1. Create a list of pdf files to be merge.
2. Merge those pdf files into a temp file. The merging order will follow the order of the items in the list.
3. Create a data table to hold bookmark data. Each datarow contains the bookmark title and the page number it points to.
4. Open the merged temp file and insert bookmarks to it using info from the bookmark data table, then save it to a new file.
5. If all successful, delete the temp file

Code of interests:

vb Code:
Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), _
                                   ByVal outputFileFullName As String) As Boolean
        Dim result As Boolean = False
        Dim pdfMerger As PDFMergerUtility = Nothing
        Dim fileCount As Integer = pdfFileList.Count
        If fileCount > 1 Then
            Try
                'Instantiate an instance of Pdf Merger Utility
                pdfMerger = New PDFMergerUtility()
                With pdfMerger
                    'Set output destination
                    .setDestinationFileName(outputFileFullName)
                    'Looping thru the file list and add source to the merger
                    For i As Integer = 0 To fileCount - 1 Step 1
                        .addSource(pdfFileList(i))
                    Next i
                    'Merge the documents
                    pdfMerger.mergeDocuments()
                    result = True
                End With
            Catch ex As Exception
                WriteToLog("MergePDFFile(" & outputFileFullName & "): " & ex.Message)
                Return False
            End Try
        End If
        Return result
    End Function
     Private Function CreateBookmarkDataTable(ByVal pdfFileList As List(Of String)) As DataTable
        Dim bookmarkData As New DataTable
        Dim row As DataRow = Nothing
        Dim bookmarkTitle As String = String.Empty
        Dim pageNumber As Integer = 0
        Try
            bookmarkData.Columns.Add("BookmarkTitle", GetType(String))
            bookmarkData.Columns.Add("PageNumber", GetType(Integer))
            Dim count As Integer = pdfFileList.Count
            If count > 0 Then
                For i As Integer = 0 To count - 1 Step 1
                    bookmarkTitle = Path.GetFileNameWithoutExtension(pdfFileList(i))
                    row = bookmarkData.NewRow()
                    row.Item("BookmarkTitle") = bookmarkTitle
                    row.Item("PageNumber") = pageNumber
                    bookmarkData.Rows.Add(row)
                    pageNumber += GetPageCount(pdfFileList(i))
                Next
            End If
        Catch ex As Exception
            WriteToLog("CreateBookmarkDataTable(): " & ex.Message)
            Return Nothing
        End Try
        Return bookmarkData
    End Function
     Private Function GetPageCount(ByVal pdfFile As String) As Integer
        Dim pageCount As Integer
        Dim pdfDoc As PDDocument = Nothing
        Try
            pdfDoc = PDDocument.load(pdfFile)
            pageCount = pdfDoc.getNumberOfPages
        Catch ex As Exception
            WriteToLog("GetPageCount(" & pdfFile & "): " & ex.Message)
            Return 0
        Finally
            If Not pdfDoc Is Nothing Then
                pdfDoc.close()
            End If
        End Try
        Return pageCount
    End Function
     Private Function AddBookMarks(ByVal pdfFile As String, _
                                  ByVal bookmarkTable As DataTable) As Boolean
        Dim result As Boolean = False
        Dim PdfDoc As PDDocument = Nothing
        Dim outFile As String = String.Empty
        Dim rowCount As Integer = bookmarkTable.Rows.Count
        Try
            If rowCount > 0 Then
                'Set the output file full path
                outFile = pdfFile.Replace("temp_", "")
                'Load the input pdf file
                PdfDoc = PDDocument.load(pdfFile)
                If Not PdfDoc.isEncrypted() Then
                    'Create new document outline and assign it to the pdf document
                    Dim outline As PDDocumentOutline = New PDDocumentOutline()
                    PdfDoc.getDocumentCatalog().setDocumentOutline(outline)
                     'Create new outline item for the document outline
                    Dim pagesOutline As PDOutlineItem = New PDOutlineItem()
                    pagesOutline.setTitle("All Pages")
                    outline.appendChild(pagesOutline)
                     'Get the list of pages in the document
                    Dim pages As List = PdfDoc.getDocumentCatalog().getAllPages()
                     Dim i, pageNumber As Integer
                    Dim row As DataRow = Nothing
                    Dim bookmarkTitle As String = String.Empty
                    'loop thru the bookmark datatable and add bookmarks to the document accordingly
                    For i = 0 To rowCount - 1 Step 1
                        'Read the row's data
                        row = bookmarkTable.Rows(i)
                        pageNumber = CInt(row.Item("PageNumber"))
                        bookmarkTitle = CStr(row.Item("BookmarkTitle"))
                        'Get the page at pageNumber from pages list
                        Dim page As PDPage = CType(pages.get(pageNumber), PDPage)
                        Dim dest As PDPageFitWidthDestination = New PDPageFitWidthDestination()
                        dest.setPage(page)
                        'Then set bookmark to it
                        Dim bookmark As PDOutlineItem = New PDOutlineItem()
                        bookmark.setDestination(dest)
                        bookmark.setTitle(bookmarkTitle)
                        'Add this bookmark to the document's outline
                        pagesOutline.appendChild(bookmark)
                    Next i
                    'Expand the bookmark tree
                    pagesOutline.openNode()
                    outline.openNode()
                    'Save the the document to a file
                    PdfDoc.save(outFile)
                    result = True
                Else
                    WriteToLog("Can't add bookmarks to <" & pdfFile & "> because the document is encrypted.")
                End If
            Else
                WriteToLog("Can't add bookmarks to <" & pdfFile & "> because BookmarkTable has no data.")
            End If
        Catch ex As Exception
            WriteToLog("AddBookmarks(" & pdfFile & "): " & ex.Message)
            Return False
        Finally
            If Not PdfDoc Is Nothing Then
                PdfDoc.close()
            End If
        End Try
        Return result
    End Function

The full source code is attached (it's a console application)

----------


## PENNYSTOCK

This looks like its going to save a lot of time.

Thanks!!!!!!!!!!!

----------


## tzmjoseph

I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.

*Error: destination PDF is encrypted, can't append encrypted PDF documents.*

I used LinkedLists instead of List and modified the code to work for this collection type.

Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?

Thanks.

----------


## Joe_Pradeep_kumar

Hi,
 I get an exception (NullReferenceException-object reference not set to an instance of an object) at mergeDocuments() of the PDFMergerUtility. Here is my code

Imports System.IO
Imports org.pdfbox.pdmodel
Imports org.pdfbox.util
Imports org.pdfbox.pdmodel.interactive.documentnavigation.destination
Imports org.pdfbox.pdmodel.interactive.documentnavigation.outline
Imports java.util

Module Module1

    Sub Main()

        'Create a pdf file list and add files to it
        Dim pdfList(2) As String
        pdfList(0) = "C:\reports\pdfFile1.pdf"
        pdfList(1) = "C:\reports\pdfFile2.pdf"

        Dim outFile As String = "C:\MergedPdf\temp_myMergedPdf.pdf"

        'Try to merge the pdf files
        If MergePdfFiles(pdfList, outFile) Then
            Console.WriteLine(" The files were merged!")
            Console.ReadLine()

        End If
    End Sub

    Private Function MergePdfFiles(ByVal pdfFileList As Array, _
                                   ByVal outputFileFullName As String) 
        Dim result As Boolean = False

        Dim fileCount As Integer = 2
        If fileCount > 1 Then
            Try
                'Instantiate an instance of Pdf Merger Utility
                Dim pdfMerger As New PDFMergerUtility
                With pdfMerger
                    'Set output destination
                    .setDestinationFileName(outputFileFullName)
                    'Looping thru the file list and add source to the merger
                    For i As Integer = 0 To fileCount - 1 Step 1
                        .addSource(pdfFileList(i))
                    Next i
                    'Merge the documents
 pdfMerger.mergeDocuments()
                    result = True
                End With
            Catch ex As Exception
            End Try
        End If
        Return result
    End Function

Now here's the catch....when i converted this application to vb.net 2.0 running in another system, the above code worked!

Where did i go wrong??!!!

----------


## stanav

> I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.
> 
> *Error: destination PDF is encrypted, can't append encrypted PDF documents.*
> 
> I used LinkedLists instead of List and modified the code to work for this collection type.
> 
> Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?
> 
> Thanks.


The error itself explains it all... It appears that one of your pdf files is either  encrypted or password protected, and PDFBox can't read that file.
As for file access permission, it should be just standard stuff. That is, the account that runs the code needs to have read permission to read a file, and write permission to a folder to write the output file... If both the input files and output file reside in the same folder then the account running the code need to have both read and write permission to that folder.

----------


## vijy

Hi Stanav,
   I am using the PDFBox to merge a list of pdf files..
   I refgerred all the com u mentioned..
   Am getting a error in the

   PdfMerger.MergeDocuments()

   Error::: Expected an integer type, actual='BC 3  s#   \  C#  ¾  o¦   &} 4Ê2³               +5C² \C '


   Here is the code..



```
    Private Sub frmMergingPdf_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        'Create a pdf file list and add files to it
        Dim pdfList As New List(Of String)
        pdfList.Add("d:\file1.pdf")
        pdfList.Add("d:\file2.pdf")
        Dim outFile As String = "d:\Pdf.pdf"
        MergePdfFiles(pdfList, outFile)
    End Sub

    Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), ByVal outputFileFullName As String)
        Dim result As Boolean = False
        Dim fileCount As Integer = 2
        If fileCount > 1 Then
            Try
                'Instantiate an instance of Pdf Merger Utility
                Dim pdfMerger As New PDFMergerUtility
                With pdfMerger
                    .setDestinationFileName(outputFileFullName)
                    For i As Integer = 0 To fileCount - 1 Step 1
                        .addSource(pdfFileList(i))
                    Next i
                    .mergeDocuments() 'Here am getting that error
                   result = True
                End With
            Catch ex As Exception
            End Try
        End If
        Return result
    End Function
```

----------


## stanav

@viji: Sometimes PdfBox encounter internal errors beyond what I can fix (such as the one you're having; mergeDocuments() is a public member of PDFMergerUtility class and we have no control over it). A better alternative is to use iTextSharp. It's faster and more reliable.

----------


## Lumpynifkin

This is a great tool, can anyone show me how to link this to an excel spreadsheet that has rows of pdf files to be merged.

I am new to VB but pretty experienced in programing so even getting me started in the right direction would be a big help.

Thanks,

Will

----------


## stanav

You'd use ADO.Net to read the data from your Excel file. There are plenty of examples on that on this website. Just search for something like "Excel ADO.Net" and you should get some hits. Once you've read the data from your xls file into your program, it's just a matter of building a list of files to be merged and pass it to the merge function.

----------


## szlamany

PDFBox seems like an interesting product...

Do you use it to initially create the PDF documents that you talk about in the first post here?

What else can the PDFBox product do?

----------


## stanav

> PDFBox seems like an interesting product...
> 
> Do you use it to initially create the PDF documents that you talk about in the first post here?
> 
> What else can the PDFBox product do?


PDFBox is used mainly for creating and manipulating pdf files on the fly. It's a pretty good product. However, I like iText/iTextSharp better because it is faster and doesn't add another 16MB of dependencies to my application as PDFBox does.

----------


## Lumpynifkin

I am having a problem using the code you show above.  When i am merging two of the same file i have no problem but if i try to merge two different pdf files, PDFBox throws an exception and only the temp file is made.  

The exception that is thrown is COSVisitorException.  

What is the deal why can i merge two of the same file but have problems if i try to merge differnt files.

----------


## stanav

That exception is thrown by PDFBox itself, not by my code. My recommendation is to use iTextSharp instead since I find iTextSharp is faster and more reliable for creating and manipulating pdf files. Also the iTextSharp's footprint is a lot smaller than PDFBox. I myself have stopped using PDFBox, and also converted all of my programs that use PDFBox to use iTextSharp.

Search this forum. I do have a thread or two on manipulating pdf files using iTextSharp.

----------


## Sommerfeld0426

I wanted to thank you soooo much for this post!! I have been searching for weeks on how to merge an unknown number of pdf files and your code led me right down that path. 

Thanks again

----------


## gloriannl

This works perfectly on my local machine but when I moved the executable file and the .dll's to our production server the merged PDF has an error message when I open it.  It says, "Could not find the XObject named 'XIPLAYER0'."   Does anyone know why?

Any help or guidance I can get, would be appreciated.
Thanks!

----------


## arch99

I am adding the script in the script task of SSIS. I don't know how to add reference to following in my project.

IKVM.GNU.Classpath
IKVM.Runtime
PDFBox-0.7.3
Thanks,
Arch

----------


## BigJRofC

This is great!

----------


## Raj_kumar

Hello,

I have a requirement to spit existing pdf file into multiple pdf files. Say - one big pdf files contains 10 bills - i have to extract each bill spanning multiple pages into seperate bill.

I have information on which page each bill starts and which page bill ends.

Can anyone suggest SSIS package to do that?

----------


## Delaney

have a look here  https://www.codeproject.com/Question...file-in-vb-net

and here https://www.e-iceblue.com/Tutorials/...-C-VB.NET.html

----------


## KipoyRavena

Hi how to do it on visual basic 6.0?

----------


## techgnome

> Hi how to do it on visual basic 6.0?


Short answer: you don't.
Slightly longer answer: You might be able to if you install and build the .NET version and build it with the "Make COM Visible" (or something like that) option turned on. Then you might (might being the operative word) be able to reference it in VB6. But... that just feels like working with a house of cards.

-tg

----------


## KipoyRavena

Do you know other way that i can merge many pdf using vb 6.0?

----------

