# VBForums CodeBank > CodeBank - Visual Basic .NET >  [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

## stanav

This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):

1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.

2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress  :Smilie:  *>>>> Last updated on 4/9/2012 <<<<*

_Please verify the version of iTextSharp you're using and download the correct class._

The current version of PdfManipulation2 class supports AES_256 encryption provided that your itextsharp.dll version is 5.1.x or higher.

Below is the list of public methods in the new PdfManipulation2 class

vb.net Code:
'Remove all restrictions from a pdf file
    Public Shared Function RemoveRestrictions(ByVal restrictedPdf As String, Optional ByVal password As String = Nothing, Optional ByVal saveABackup As Boolean = True) As Boolean
    
    'Parse text from a specified range of pdf pages    
    Public Shared Function ParsePdfText(ByVal sourcePDF As String, _
                                  Optional ByVal fromPageNum As Integer = 0, _
                                  Optional ByVal toPageNum As Integer = 0) As String
    
    'Parse all text from a pdf
    Public Shared Function ParseAllPdfText(ByVal sourcePDF As String) As Dictionary(Of Integer, String)
    
    'Page to page comparision of 2 pdf files and write the differences to a resulting text file    
    Public Shared Sub ComparePdfs(ByVal pdf1 As String, ByVal pdf2 As String, _
                                  ByVal resultFile As String, _
                                  Optional ByVal fromPageNum As Integer = 0, _
                                  Optional ByVal toPageNum As Integer = 0)
   
    'Extract specified pages from a pdf to create a new pdf
    Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
     'Split a pdf into specified number of pdfs
    Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
    
    'Split a pdf into multiple pdfs each containing a specified number of pages.  
    Public Shared Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
    
    'Extract pages from multiple source pdfs and merge into a final pdf    
    Public Shared Sub ExtractAndMergePdfPages(ByVal sourceTable As DataTable, ByVal outPdf As String)
     
    'Set security password on an existing pdf file  
    Public Shared Sub SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
     
    'Add watermark to pdf pages using an image   
    Public Shared Sub AddWatermarkImage(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkImage As String)
    
    'Add water mark to all pdf pages using text    
    Public Shared Sub AddWatermarkText(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkText() As String, _
                                       Optional ByVal watermarkFont As iTextSharp.text.pdf.BaseFont = Nothing, _
                                       Optional ByVal watermarkFontSize As Single = 48, _
                                       Optional ByVal watermarkFontColor As iTextSharp.text.BaseColor = Nothing, _
                                       Optional ByVal watermarkFontOpacity As Single = 0.3F, _
                                       Optional ByVal watermarkRotation As Single = 45.0F)
     'Merge multiple pdfs into a single one.   
    Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
                                         Optional ByVal authorName As String = "", _
                                         Optional ByVal creatorName As String = "", _
                                         Optional ByVal subject As String = "", _
                                         Optional ByVal title As String = "", _
                                         Optional ByVal keywords As String = "") As Boolean
     'Merge multiple pdf's into one with all bookmarks preserved
    Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
        
    'Add document outline (bookmarks) to a pdf
    Public Shared Sub AddDocumentOutline(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal outlineTable As System.Data.DataTable)
     
    'Extract urls from a pdf   
    Public Shared Function ExtractURLs(ByVal sourcePdf As String, Optional ByVal pageNumbers() As Integer = Nothing) As System.Data.DataTable
        
    'Extract images from a pdf
    Public Shared Function ExtractImages(ByVal sourcePdf As String) As List(Of Image)
     
    'Fill a form   
    Public Shared Sub FillAcroForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String) 
     Public Shared Sub FillMyForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
     'Add annotatation
    Public Shared Sub AddTextAnnotation(ByVal sourcePdf As String, ByVal outputPdf As String)
     Public Shared Function GetAcroFieldData(ByVal sourcePdf As String) As Dictionary(Of String, String)
        
    Public Shared Function GetPdfSummary(ByVal sourcePdf As String) As DataTable
        
    Public Shared Function ReplacePagesWithBlank(ByVal sourcePdf As String, _
                                                 ByVal pagesToReplace As List(Of Integer), _
                                                 ByVal outPdf As String, _
                                                 Optional ByVal templatePdf As String = "") As Boolean
       
    Public Shared Function InsertPages(ByVal sourcePdf As String, _
                                       ByVal pagesToInsert As Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage), _
                                       ByVal outPdf As String) As Boolean
       
    Public Shared Function RemovePages(ByVal sourcePdf As String, ByVal pagesToRemove As List(Of Integer), ByVal outputPdf As String) As Boolean
     
    'A demo on how to draw various shapes in itextsharp   
    Public Shared Sub DrawShapesDemo(ByVal sourcePdf As String, ByVal outputPdf As String)
         
    Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)


Any comments are welcomed.
Happy coding  :Smilie: 
Stanav.

----------


## nbrege

Stanav ... thanks for posting these code samples.  They helped me on a project that I am currently working on.  I would like to request that you post another sample:  I need to be able to extract specified pages from multiple documents & save them to one combined PDF.  ie. take pages 3 & 7 from Doc1.pdf, 4-6 from Doc2.pdf & 1, 5 & 12 from Doc3.pdf and save them in Doc4.pdf  Is this "do-able"?

----------


## stanav

Yes, it's doable. However, I'm on vaction right now and I do not have access to my work computer which has all the needed tools to write code. What you can do right now is to create a function that returns a hashtable or a dictionary with the file names (string) being the keys and the pages to extract (integer array) being the values. Once you have this hashtable/dictionary, you can modify the ExtractPdfPage sub such that it will create a single new pdf file and then loop trhu the hashtable/dictionary to extract the pages and add them o the output pdf. It's just a matter of setting up the loop right such that in each loop, you read an entry and extract pages from that file.
If you can wait until later this week when I return to work, I can try to come up with something for you in code.
Best regards,
Stanav.

----------


## nbrege

If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing.  Enjoy the rest of your vacation...

----------


## stanav

> If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing.  Enjoy the rest of your vacation...


I've added a method to do what you need. Since the total text is more than 1000 characters, I had to put all the code in to a class (PdfManipulation.vb) and post it as an attachment. Hope it helps.

----------


## gaigoi113

Hi Stanav,

   Do you have any code sample that will convert pdf to multipage tiff? - thanks

----------


## stanav

> Hi Stanav,
> 
>    Do you have any code sample that will convert pdf to multipage tiff? - thanks


It's impossible to use iTextSharp to convert pdf to multipage tiff. However, you can use PDFBox to convert each pdf page to an image file (it only outputs to jpg's or png's), then merge these images into a multipage tiff.

To download PDFBox, go here:
http://www.pdfbox.org/index.html

To merge multiple images into 1 multipage tiff, check out this codeproject article:
http://www.codeproject.com/KB/GDI-pl...ipageTiff.aspx

And good luck  :Smilie:

----------


## MasterRipper

Hi all.

I know this thread is old, but I am using the iTextSharp library in this exact way.

I have a PDF with 4 pages and use this code to extract page 3 in a quick example prog I made.

However, the original PDF has text fields I can edit ( acrofields ) and after extraction the 3rd page, loses these fields.

Any idea(s) what I can change / do to keep these editable fields in the resulting page 3.

Thanks.

----------


## cthai

Hi,

I'm trying to extract a single page from a multi page pdf and I'm using the code below; however, I'm getting an error that it's not recognizing <param name>. Any help would be great. Thanks.



```
''' <summary>
    ''' Extract a single page from source pdf to a new pdf
    ''' </summary>
    <param name="sourcePdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"</param>
    <param name="pageNumberToExtract">"P1T1"</param>
    <param name="outPdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40a.pdf"</param>
    ''' <remarks></remarks>
    Public Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumberToExtract As Integer, ByVal outPdf As String)
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim doc As iTextSharp.text.Document = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Try
            reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
            doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
            pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
            doc.Open()
            page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
            pdfCpy.AddPage(page)
            doc.Close()
            reader.Close()
        Catch ex As Exception
            Throw ex
        End Try
    End Sub
```

----------


## stanav

Why are you putting your arguments in the code comments? That's not how you do it. You need to call the sub and pass in your arguments, something like this:

vb.net Code:
'Specified the path to the source pdf fileDim sourcePdf as sgtring = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf" 'Extract page # 2 off this above pdf fileDim pageNumberToExtract As Integer = 2 'And then save it to a new pdf named 'table40_page2.pdf'Dim outputPdf As String = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40_page2.pdf" 'Call the sub somewhere in your program passing in the above argumentsPdfManipulation.ExtractPdfPage("C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf", pageNumberToExtract, outputPdf)

----------


## slow&steady

Stanav :

i have tried itextsharp for putting watermark on pdfs.It worked fine.

Now i am trying to edit Header on existing pdf files to desired header.

Is it possible.

if its possible then i have to try to use it on the bunch of pdf files in one single folder

Thanks for the help 

Sri

----------


## stanav

> Stanav :
> 
> i have tried itextsharp for putting watermark on pdfs.It worked fine.
> 
> Now i am trying to edit Header on existing pdf files to desired header.
> 
> Is it possible.
> 
> if its possible then i have to try to use it on the bunch of pdf files in one single folder
> ...


Yes, it's possible to add/change the header/footer of an existing pdf file and save the result to a new file. Please post your question in VB.Net forum because it's a different subject and doeasn't belong to this code bank thread.

----------


## vijy

Hi Stanav,
  its possible to extract the PDF pages with bookmarks?

----------


## stanav

> Hi Stanav,
>   its possible to extract the PDF pages with bookmarks?


Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.

vb.net Code:
''' <summary>
    ''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
    ''' </summary>
    ''' <param name="sourcePdf">full path to sthe source pdf</param>
    ''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
    ''' <param name="outPdf">the full path to the output pdf</param>
    ''' <remarks></remarks>
    Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
         Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim outlines As System.Collections.ArrayList = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
        Dim hshTable As System.Collections.Hashtable = Nothing
        Try
            raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
            reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
            outlines = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)
            reader.SelectPages(New System.Collections.ArrayList(pageNumbersToExtract))
            stamper = New iTextSharp.text.pdf.PdfStamper(reader, New IO.FileStream(outPdf, IO.FileMode.Create))
            RemoveUnusedBookmarks(outlines, pageNumbersToExtract)
            stamper.Outlines = outlines
            stamper.Close()
            reader.Close()
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
    End Sub
     Private Shared Sub RemoveUnusedBookmarks(ByRef bookmarks As System.Collections.ArrayList, ByVal pagesToKeep() As Integer)
        Dim bookmark As System.Collections.Hashtable = Nothing
        Dim obj As Object = Nothing
        For i As Integer = bookmarks.Count - 1 To 0 Step -1
            obj = bookmarks(i)
            If TypeOf obj Is System.Collections.ArrayList Then
                RemoveUnusedBookmarks(DirectCast(obj, System.Collections.ArrayList), pagesToKeep)
            ElseIf TypeOf obj Is System.Collections.Hashtable Then
                bookmark = DirectCast(obj, System.Collections.Hashtable)
                If bookmark.ContainsKey("Page") Then
                    Dim value As String = DirectCast(bookmark.Item("Page"), String)
                    If Not String.IsNullOrEmpty(value) Then
                        Dim parts() As String = value.Split(" "c)
                        If parts.Length > 0 Then
                            Dim pageNum As Integer = -1
                            If Integer.TryParse(parts(0), pageNum) Then
                                Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
                                If idx < 0 Then
                                    bookmarks.Remove(obj)
                                Else
                                    parts(0) = (idx + 1).ToString
                                    value = String.Join(" ", parts)
                                    bookmark.Item("Page") = value
                                End If
                            End If
                        End If
                    End If
                End If
            End If
        Next
    End Sub

Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try  :Smilie: 

Good luck.

----------


## vijy

Thanks stanav... 
yep i tried and i get... 


Splitting Code:
Public Function SplitPdfFiles(ByVal iStartPage As String, ByVal iEndPage As String, ByVal sPDFPath As String) As Boolean
        Try
            'Variables to hold the split file informations
           
            Dim reader As PdfReader = New PdfReader(sPDFPath)
            reader.RemoveUnusedObjects()
            reader.ConsolidateNamedDestinations()
             Dim importedPage As PdfImportedPage = Nothing
            Dim currentDocument As New Document
            Dim pdfWriter As PdfSmartCopy = Nothing
             
            Dim bIsFirst As Boolean = True
            For j As Integer = iStartPage To iEndPage
                If bIsFirst Then
                    bIsFirst = False
                    currentDocument = New Document(reader.GetPageSizeWithRotation(1))
                    pdfWriter = New PdfSmartCopy(currentDocument, New System.IO.FileStream(System.IO.Path.GetDirectoryName(sInFile) & "\" & sSplitName, System.IO.FileMode.Create))
                    pdfWriter.SetFullCompression()
                    ' pdfWriter.CompressionLevel = PdfStream.BEST_COMPRESSION
                    pdfWriter.PdfVersion = reader.PdfVersion
                    currentDocument.Open()
                End If
                 importedPage = pdfWriter.GetImportedPage(reader, j)
                pdfWriter.AddPage(importedPage)
            Next
             Dim bookMark As New ArrayList
            bookMark = SimpleBookmark.GetBookmark(reader)
          
            If bookMark IsNot Nothing Then
                SimpleBookmark.EliminatePages(bookMark, New Integer() {iEndPage + 1, reader.NumberOfPages})
                If iStartPage > 1 Then
                    SimpleBookmark.EliminatePages(bookMark, New Integer() {1, iStartPage})
                    SimpleBookmark.ShiftPageNumbers(bookMark, -(iStartPage - 1), Nothing)
                End If
                pdfWriter.Outlines = bookMark
            End If
            currentDocument.Close()
            pdfWriter.Close()
            Return True
        Catch ex As Exception
        End Try
        Return False
    End Function

this one working fine.. and the pdf extracting with actual bookmarks..




> This approach should work but I don't know why it only preserves 1st level bookmarks


the problem is its preserving first level bookmarks.. Stanav, its possible to get atleast the child bookmarks collection..??

----------


## selnahwy

Has anyone found a code example on how to convert PDF to image using iTextSharp or PDFBox?

----------


## mpires

Hi Stanav,

First nice work, you help me allot, wit you example but i have a question,

I'm using the "SplitPdfByPages" and is working ok, but is there any reason for the extraction pdf's end with a larger size that the original that as 5.pag?


Ex.: 

Original pdf with 5.pag ( 72KB )

I extract the 5.pag with your example code, and etch pag ends with 85KB

Is there any way to compress the extraction pages? or some reason for this?


Regards,

----------


## prabakarank

Hi,
I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported". 

Please give the solutions for the above problem. Please do the needful.

----------


## stanav

> Hi,
> I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported". 
> 
> Please give the solutions for the above problem. Please do the needful.


You download the file and save it to a temp location 1st. After that, you can split it as usual. If you don't need the original pdf after done splitting, you can delete it.
To download a file from an url, you can use a WebClient or simply use
My.Computer.Network.DownloadFile(url, saveLocation).

----------


## prabakarank

Hi ,
I need to pass the parameter like this ("http://localhost:1870/PDFWCFService/1.pdf",1,"http://localhost:1870/PDFWCFService/2.pdf") in the SplitPdfByPages method..
The output file in the format of URL.
It returns following error "Uri format is not supported".
Please give the solutions for the above problem. Please do the needful.

----------


## stanav

You need to supply the physical file paths... There's no way around it because we rely on iTextSharp to do the work, and if iTextSharp doesn't support it, there's not much we can do to.
However, that is not a problem. The problem is with your methodology of doing things. While you can access (download) a file from an url, you cannot upload the file using an url. If you are to run the splitting task any PC, you will need to download the file to the local PC, split it and then upload it back. If you're to run that splitting task on the server that host your web site, you have to give it the direct physical paths and not the url's. You cannot treat an url the same as a conventional file path.

----------


## prabakarank

Hi,
i got the below error
Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'.

Whats the reason i got that error. How we avoid this type error. Is there any solution for this problem.

----------


## stanav

Show the code where the error occured...

----------


## prabakarank

Below is the code. I converted from Vb.net to C#. 

iTextSharp.text.pdf.PdfReader reader = null;
        iTextSharp.text.Document doc = null;
        iTextSharp.text.pdf.PdfCopy pdfCpy = null;
        iTextSharp.text.pdf.PdfImportedPage page = null;
        int pageCount = 0;
        try
        {
            reader = new iTextSharp.text.pdf.PdfReader(sourcePdf);
            pageCount = reader.NumberOfPages;
            if (pageCount < numOfPages)
            {
                return -1;
                throw new ArgumentException("Not enough pages in source pdf to split");
            }
            else
            {
                string ext = System.IO.Path.GetExtension(baseNameOutPdf);
                string outfile = string.Empty;
                int n = Convert.ToInt32(Math.Ceiling(Convert.ToDouble(pageCount) / Convert.ToDouble(numOfPages)));
                int currentPage = 1;
                for (int i = 1; i <= n; i++)
                {
                    outfile = baseNameOutPdf.Replace(ext, "_" + i + ext);
                    doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));

                    //pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
                    pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
                    //pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, System.Net.HttpWebRequest.Create(outfile).GetResponse().GetResponseStream());
                    doc.Open();
                    if (i < n)
                    {
                        for (int j = 1; j <= numOfPages; j++)
                        {

                            page = pdfCpy.GetImportedPage(reader, currentPage);
                            pdfCpy.AddPage(page);--------Here only error is happen.
                            currentPage += 1;
                        }
                    }
                    else
                    {
                        for (int j = currentPage; j <= pageCount; j++)
                        {
                            page = pdfCpy.GetImportedPage(reader, j);
                            pdfCpy.AddPage(page);
                        }
                    }
                    doc.Close();

                }
            }
            reader.Close();
            return 1;
        }
        catch (Exception ex)--When i see the exception it will that error.
        {
            return -1;
            throw ex;
        }



is this error happen because of particular PDF????

----------


## stanav

> is this error happen because of particular PDF????


Probably... Can you upload a copy of that particluar pdf file so that I can use it to investigate further?

----------


## prabakarank

Hi i uploaded the pdf file. please check the application with the PDF file.
This pdf file is 3 page pdf file. First page is successfully splitted. When second page split it gives the following error "Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'."

Please let me know How can we solved the issue??

----------


## vijy

I passed your pdf for the below method, its spliiting all pages exactly.


```
SplitPdfByParts("E:\Vijay\E-Pub RandE\ComparedEPubPDF\ComparedEPubPDF\bin\Debug\2.pdf", 3, "temp.pdf")
```


vb Code:
Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim doc As iTextSharp.text.Document = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim pageCount As Integer = 0
        Try
            reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
            pageCount = reader.NumberOfPages
            If pageCount < parts Then
                Throw New ArgumentException("Not enough pages in source pdf to split")
            Else
                Dim n As Integer = pageCount \ parts
                Dim currentPage As Integer = 1
                Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
                Dim outfile As String = String.Empty
                For i As Integer = 1 To parts
                    outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
                    doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
                    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
                    doc.Open()
                    If i < parts Then
                        For j As Integer = 1 To n
                            page = pdfCpy.GetImportedPage(reader, currentPage)
                            pdfCpy.AddPage(page)
                            currentPage += 1
                        Next j
                    Else
                        For j As Integer = currentPage To pageCount
                            page = pdfCpy.GetImportedPage(reader, j)
                            pdfCpy.AddPage(page)
                        Next j
                    End If
                    doc.Close()
                Next
            End If
            reader.Close()
        Catch ex As Exception
            Throw ex
        End Try
    End Sub

----------


## prabakarank

Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
I have used "itextsharp-5.0.2-dll" . 
Please check with once again whether its working or not.. please be sure that 
all splitted pdf files are created.

----------


## prabakarank

Hi.. 
I have one question. Is there any possible to set password for the each splitted pdf file. 
Please tell me how we can do this.

----------


## stanav

> Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
> I have used "itextsharp-5.0.2-dll" . 
> Please check with once again whether its working or not.. please be sure that 
> all splitted pdf files are created.


I've uploaded the new PdfManipulation2 class which works with itextsharp 5.0.2.

----------


## stanav

> Hi.. 
> I have one question. Is there any possible to set password for the each splitted pdf file. 
> Please tell me how we can do this.


I don't know anyway to set passwords to the splitted pdf's on the fly. However, you can certainly do it on a 2nd pass. 
1st pass: split the pdf as usual.
2nd pass: use PdfEncryptor.Encrypt method to set the user and/or owner passwords to those newly spliited pdfs. You can do this in a separate method after done splitting or you can set the password to each splitted pdf right after it is created. The 2nd approach is preferred. It's just a few extra line of codes. If you have trouble figuring it out, let me know.

----------


## nbrege

stanav ... what functions are included in your new class?

----------


## stanav

> stanav ... what functions are included in your new class?


I updated my original post to include a list of public methods in the new class.

----------


## blofvendahl

Does the MergePdfFiles routine also merge bookmarks?

----------


## stanav

> Does the MergePdfFiles routine also merge bookmarks?


No, it doesn't...

----------


## prabakarank

Hi,
I got the below error. 
"PdfReader not opened with owner password"
What we have to resolve the issue??

Thanks

----------


## prabakarank

Hi,
Can you give me the code to set password for each split pdf files.

Thanks

----------


## stanav

> Hi,
> Can you give me the code to set password for each split pdf files.
> 
> Thanks


It's already in the PdfManipulation2 class. The method is:


```
SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
```

----------


## stanav

> Hi,
> I got the below error. 
> "PdfReader not opened with owner password"
> What we have to resolve the issue??
> 
> Thanks


1. You need to know the owner password of the pdf you're working on.
2. Use the 2nd overload of the PdfReader class contructor which allows you to supply the owner password as a byte array when you create a pdfreader object. Something like this:


```
 Dim ownerPwd As String = "put the owner password here"
            Dim pwdBytes() As Byte = System.Text.Encoding.Default.GetBytes(ownerPwd)
            Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePDF, pwdBytes)
```

The rest of the code is the same.

3. If you forget the owner password for some reason, you will have to remove all restrictions on that pdf using the RemoveRestrictions method and save the new unrestricted pdf to a temp location. You then can work on that temporary unrestricted pdf as normal. When done, delete it if you don't want to keep it.

----------


## blofvendahl

Hey Stanav,

Which method in your class, if any, can be used to extract bookmark info from a pdf?

thanks
Brian

----------


## stanav

> Hey Stanav,
> 
> Which method in your class, if any, can be used to extract bookmark info from a pdf?
> 
> thanks
> Brian


You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it


```
 Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
        Dim result as Boolean = False
        Try
            Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
            Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
            Using outFile As New IO.StreamWriter(outputXML)
                SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
            End Using
            reader.Close()
            result = True
        Catch ex As Exception
            Throw New ApplicationException(ex.Message, ex)
        End Try
        Return result
    End Function
```

I'm also working on a method to merge pdf files with all bookmarks preserved. However, it works only with bookmarks that use the page number as the destination. Bookmarks that use named destination get broken after merged (that is you still see all the bookmarks but it doesn't work (go to a destination) when clicked on). That's why I'm not posting the solution yet.

----------


## cyberstaind

Hi Stanav,

I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.


Regards,

Staind
 :Frown:

----------


## stanav

> Hi Stanav,
> 
> I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.
> 
> 
> Regards,
> 
> Staind


Your question is not related to the current thread at all. Please make a new post in VB.Net forum. Make sure you describe the question clearly too.

----------


## blofvendahl

> You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it
> 
> 
> ```
>  Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
>         Dim result as Boolean = False
>         Try
>             Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
>             Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
> ...



Thanks Stanav.  The SimpleBookmark solution worked great.  I'll keep checking back for your method to merge PDF's and all bookmarks.  That'll really come in handy

----------


## stanav

This is the updated method for merging pdf files with all the bookmarks preserved. It is also available in the PdfManipulation2 class.


```
 Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
        Dim result As Boolean = False
        Dim pdfCount As Integer = 0     'total input pdf file count
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim pdfDoc As iTextSharp.text.Document = Nothing    'the output pdf document
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim pageCount As Integer = 0    'number of pages in the current pdf
        Dim totalPages As Integer = 0   'number of pages so far in the merged pdf
        Dim bookmarks As New System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object))
        Dim tempBookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = Nothing
        ' Must have more than 1 source pdf's to merge
        If sourcePdfs.Length > 1 Then
            Try
                For i As Integer = 0 To sourcePdfs.GetUpperBound(0)
                    reader = New iTextSharp.text.pdf.PdfReader(sourcePdfs(i))
                    reader.ConsolidateNamedDestinations()
                    pageCount = reader.NumberOfPages
                    tempBookmarks = SimpleBookmark.GetBookmark(reader)
                    If i = 0 Then
                        pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
                        pdfCpy = New iTextSharp.text.pdf.PdfCopy(pdfDoc, New System.IO.FileStream(outputPdf, IO.FileMode.Create))
                        pdfDoc.Open()
                        totalPages = pageCount
                    Else
                        If tempBookmarks IsNot Nothing Then
                            SimpleBookmark.ShiftPageNumbers(tempBookmarks, totalPages, Nothing)
                        End If
                        totalPages += pageCount
                    End If
                    If tempBookmarks IsNot Nothing Then
                        bookmarks.AddRange(tempBookmarks)
                    End If
                    For n As Integer = 1 To pageCount
                        page = pdfCpy.GetImportedPage(reader, n)
                        pdfCpy.AddPage(page)
                    Next
                    reader.Close()
                Next
                pdfCpy.Outlines = bookmarks
                pdfDoc.Close()
                result = True
            Catch ex As Exception
                Throw New ApplicationException(ex.Message, ex)
            End Try
        End If
        Return result
    End Function
```

----------


## prabakarank

Dear Team,

I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???

Please help me regarding this issue.

----------


## stanav

> Dear Team,
> 
> I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???
> 
> Please help me regarding this issue.


Can you upload a sample pdf so that I can test I out myself? I'm not promising anything, but if I have a sample file and figure out what the problem is, I may or may not be able to find a solution for you.

----------


## prabakarank

Hi..

I have attached the Sample PDF for your reference. In this pdf, it contains the two hyperlinks.

If we splitted that pdf, that hyperlink is removed.

----------


## stanav

> Hi..
> 
> I have attached the Sample PDF for your reference. In this pdf, it contains the two hyperlinks.
> 
> If we splitted that pdf, that hyperlink is removed.


The sample pdf you uploaded has only 1 page with no hyper links. It also appears to me that this is a scanned pdf (one that is created by scanning a document through a scanner) - You cannot do much with this kind of pdf files.

----------


## prabakarank

Hi,
In this PDF, there is a word contains at the bottom of the page(www.craneyhill.com). If you click that it will open the site of craneyhill.

At the same time, in right side of the page, there is one logo(CRANNEY HILL KENNEL). If you click that logo, it will also open the site. Please check this.

----------


## prabakarank

Hi.

I use the below code for checking any annotation link is there or not.
In previous thread i attached 2.pdf. When it parsed, that page contains two 
Link annotation. After splitted, it does not have hyperlinks(Linke annotation does not persist). Please help me..its is urgent.

PdfReader reader = new PdfReader(sourcePdf);
        FileStream fs = new FileStream(outputPdf, System.IO.FileMode.Open, System.IO.FileAccess.Write);
        PdfStamper stamper = new PdfStamper(reader, fs);

        PdfDictionary objPdfDictionary = reader.GetPageN(n);
        PdfArray annotarray = (PdfArray)PdfReader.GetPdfObject(objPdfDictionary.Get(PdfName.ANNOTS));
        if (annotarray != null && annotarray.Size > 0)
        {
            foreach (PdfIndirectReference annot in annotarray.ArrayList)
            {
                PdfDictionary annotationDic = (PdfDictionary)PdfReader.GetPdfObject(annot);
                PdfName subType = (PdfName)annotationDic.Get(PdfName.SUBTYPE);
                if (subType.Equals(PdfName.LINK))
                {
                }
            }

        }

----------


## stanav

Your sample pdf has only 1 page... How am I supposed to test splitting it? The only option for me to test splitting this file is to use the SplitByPages method and specify the number of page to split = 1. The hyperlinks work fine after splitted. For further testing, you need to provide me a sample file with more than 1 page.

----------


## blinsner

Stanav,

Thanks for posting this! I have a question if you have time...

In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".

I am new to this, can you please provide an example of that.
I have been trying to do it but I keep getting an error that my PDF is in use.

Thanks,
Brian

----------


## stanav

> Stanav,
> 
> Thanks for posting this! I have a question if you have time...
> 
> In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".
> 
> I am new to this, can you please provide an example of that.
> I have been trying to do it but I keep getting an error that my PDF is in use.
> 
> ...


OK... Supposed you have a pdf file named "pdf1" which you want to insert some pages into it. Those pages are in another pdf file called "pdf2". So you need to get the pages you need from pdf2, add it to a dictionary and then call InsertPages method to insert these pages from pdf2 into pdf1.
1. Let's say you need pages 2, 3 and 5 from pdf2 and to be inserted as page 6, 9 and 4 in pdf1. So the 1st thing you need is to create that dictionary


```
'Create the dictionary
        Dim pdf2 As String = "path to your pdf2 file here"
        Dim reader2 As New iTextSharp.text.pdf.PdfReader(pdf2)
        Dim doc2 As New iTextSharp.text.Document(reader2.GetPageSizeWithRotation(1))
        Dim pdfCpy As New iTextSharp.text.pdf.PdfCopy(doc2, New IO.MemoryStream())
        Dim pageDict As New Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage)
        'Get page 2, 3, and 5 from pdf2 and add it to the dictionary with key 6, 9 and 4
        pageDict.Add(6, pdfCpy.GetImportedPage(reader2, 2))
        pageDict.Add(9, pdfCpy.GetImportedPage(reader2, 3))
        pageDict.Add(4, pdfCpy.GetImportedPage(reader2, 5))
        
'Insert those pages into pdf1
Dim pdf1 as string = "path to your pdf1 here"
Dim output as string = "path to the output pdf here"
PdfManipulation.InsertPages(pdf1, pageDict, output)
```

----------


## prabakarank

Hi,

My file is Password Protected File. But i dono the password. I want to split that Password Protected PDF?

How do i achieved this?

----------


## stanav

> Hi,
> 
> My file is Password Protected File. But i dono the password. I want to split that Password Protected PDF?
> 
> How do i achieved this?


This is against the forum's AUP and thus we should not discuss it here. Why can't you get the password from the creator of that pdf?

----------


## prabakarank

Hi stanav,
Thanks.
I want to upload a new file. But i dont want to split the file, at the same time i want to set the password for that file.
How can i achieved this?

----------


## moti barski

can it add pictures to into an existing pdf ? if so walkthrough please

----------


## stanav

> can it add pictures to into an existing pdf ? if so walkthrough please


If you had read the original post (post#1), you should have seen the list of available methods the PdfManipulation2 class has. Among those methods, you should have spotted the AddImageToPage method which is probably what you need.

----------


## moti barski

is it needed to download iTextSharp to work the pdf classes ?
also :



> Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)


can you examplify ? (implementation)

----------


## stanav

> is it needed to download iTextSharp to work the pdf classes ?
> also :
> 
> 
> can you examplify ? (implementation)


The thread title has the phrase " Using iTextSharp", so I think that should already answer your question. However, I just want to confirm it again: yes, you will need to download iTextSharp and reference itextsharp.dll in your project to use the code.

As for giving an example on calling a method, you simply call the method and pass in the required arguments. That's it. 
What are the required arguments? Anything that is not optional.
1. sourcePdf: the full path to the source pdf file (the one that you want to add pictures to)
2. outputPdf: The full path to save the output pdf (pictures added pdf)
3. imgPath: The full path to the image (picture) file you want to use to add to the source pdf.
4.  imgLocation: the (x, y) coordinate on the page where the picture should be placed - passed in as Point.
5. imgSize: how large the image will be sized to?
6. pages(): optional - the array of the page numbers to add the image to. If obmitted, the image will be added to every page in the pdf.

----------


## prabakarank

Hi Stanav,
In my application, i want to split the pdf. If i upload  PDF contains 10 pages with file size 10 mb for split, After splitting the combine file size of each pdfs will result into above 20mb file size. If this possible to reduce the file size(each pdf).

Please let me know.

Thanks in advance

----------


## prabakarank

Hi Stanav,

Please give your skype id for contacting regarding the PDF Split.

Thanks in advance.

----------


## prabakarank

Hi, Stanav,

Is this possible to read the annotation from the pdf using iTextSharp. Please do the needful.

Thanks in advance

----------


## tbird888

Hello Stanav,

I was wondering if there is a way to look at an image inside a PDF and pull its width and height in pixels in iTextSharp?

----------


## stanav

> Hello Stanav,
> 
> I was wondering if there is a way to look at an image inside a PDF and pull its width and height in pixels in iTextSharp?


You can try extracting the images and get the Width and Height properties from them. In the PdfManipulation2 class, there is is function to extract images from a pdf.

----------


## tbird888

Stanav, thank you for the reply. I'm using the ExtractImages function to pull the images like you suggested and am encountering an error with this line:

Dim img As Drawing.Image = Drawing.Image.FromStream(memStream)

The error that is shown during debugging is this:
Item = Argument not specified for parameter 'key' of 'Public Default Property Item(key As Object) As Object'.

Do you have any thoughts on what could be causing the error? Just above the Try...Catch block where the error occurred, the Byte array is being populated so something is breaking down inside the Using block.

----------


## prabakarank

Hi Stanav,

I need to convert a single page pdf in to image using iTextSharp. Is this possible? If so, please give me sample code.

Thanks

----------


## stanav

> Hi Stanav,
> 
> I need to convert a single page pdf in to image using iTextSharp. Is this possible? If so, please give me sample code.
> 
> Thanks


No, it's not possible with iTextSharp.

----------


## craigison

I have parge PDF Files. 14000 pages or so. The Isharptext fails in 
Public PdfDictionary() : base(DICTIONARY) {   
   hashmap =new Dictionary<PdfName,PdfObject>();
}

the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"

I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.

----------


## stanav

> I have parge PDF Files. 14000 pages or so. The Isharptext fails in 
> Public PdfDictionary() : base(DICTIONARY) {   
>    hashmap =new Dictionary<PdfName,PdfObject>();
> }
> 
> the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"
> 
> I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.


What are you trying to do with those huge pdf files? Is it possible to split them into multiple smaller ones 1st then process these?

----------


## craigison

This Company sends us reports in PDF. I cant control the number of pages they put in a report. I have attemped to use The PDFManuipulation2.vb witch used thes itextsharp to extract them to single pages. Everything fails with the large files.. including spliting them? Stack overflow.. every time? Where do i go from here?

----------


## stanav

So you're splitting the original pdf into multiple 1 page pdf's, is that correct? If it is, then you should not have any problem, just need to change the way you read the original pdf. This should do it:


```
 Public Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
        Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim doc As iTextSharp.text.Document = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim pageCount As Integer = 0

        Try
            raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
            reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
            pageCount = reader.NumberOfPages
            If pageCount < numOfPages Then
                Throw New ArgumentException("Not enough pages in source pdf to split")
            Else
                Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
                Dim outfile As String = String.Empty
                Dim n As Integer = CInt(Math.Ceiling(pageCount / numOfPages))
                Dim currentPage As Integer = 1
                For i As Integer = 1 To n
                    outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
                    doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
                    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
                    doc.Open()
                    If i < n Then
                        For j As Integer = 1 To numOfPages
                            page = pdfCpy.GetImportedPage(reader, currentPage)
                            pdfCpy.AddPage(page)
                            currentPage += 1
                        Next j
                    Else
                        For j As Integer = currentPage To pageCount
                            page = pdfCpy.GetImportedPage(reader, j)
                            pdfCpy.AddPage(page)
                        Next j
                    End If
                    doc.Close()
                Next
            End If
            reader.Close()
        Catch ex As Exception
            Throw ex
        End Try
    End Sub
```

And you would the sub like this:


```
Dim sourcePdf as string = "path to your huge pdf file here"
Dim numOfPage = 1        '< 1 page per output pdf
Dim baseName as String = "Splitted-"   '< the base file name for output pdf's.
SplitPdfByPages(sourcePdf, numOfPage, baseName)
```

----------


## ashok.arumugam

Hi Stanav,

Your code helps me a lot in my application. I am having a doubt in PDF hyperlinks. Please clarify me.
I have used your sample for splitting my PDF file. Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that have hyperlink that navigate to other page. Eg: In the page 4 there are three hyperlinks which when clicked navigates to corresponding 24th,27th,28th page. As same as the 4th page there are nearly 12 pages that is having this hyperlinks with them. Following your code I had splitted this PDF pages into 72 separate file and saved with the name as 1.pdf,2.pdf....72.pdf. So in the 4.pdf when clicking that hyperlinks I need to make the PDF navigate to 24.pdf,27.pdf,28.pdf. Please help me out here how can I set the hyperlinks in the 4.pdf so that it navigates to corresponding pdf files.  i.e I need to edit the destination of the hyperlink and make it to point out to another destination or else i need to place some text (eg: pageTo:24) in the url of the link. Please help me.

NOTE: The link in the PDF holds the PRIndirectReference for linking within the pages. I have attached a sample PDF file(4.pdf). Please provide a sample code for this.

Thank you,
Ashok

----------


## ashok.arumugam

Hi Stanav, 
Can you please help me to extract URLs from PRIndirectRefrence? The ExtractURLs function can able to fetch the URL from the link created with Anchor class. But I cannot able to extract the URLs or Link from the PRIndirectReference. I need to get the reference link and edit that link to navigate to another location. Eg: In the attached PDF 4.pdf there are 5 links(in the text 18,68,17,48,52) each navigating to the page no 18,page no 68... The 4.pdf is splitted from sample.pdf which holds 72 pages. Now once i splitted that PDF into 72 separate PDF files I need to navigate from one file to another. So that when clicking 18,68,17.. it must navigate to 18.pdf,68.pdf,17.pdf. Please help me to code how can I edit those links which are use PRIndirectRefrence.

Thank you,
Ashok

----------


## kakahappns

> Thanks stanav... 
> yep i tried and i get... 
> 
> 
> Splitting Code:
> Public Function SplitPdfFiles(ByVal iStartPage As String, ByVal iEndPage As String, ByVal sPDFPath As String) As Boolean
        Try
            'Variables to hold the split file informations
           
            Dim reader As PdfReader = New PdfReader(sPDFPath)
            reader.RemoveUnusedObjects()
            reader.ConsolidateNamedDestinations()
             Dim importedPage As PdfImportedPage = Nothing
            Dim currentDocument As New Document
            Dim pdfWriter As PdfSmartCopy = Nothing
             
            Dim bIsFirst As Boolean = True
            For j As Integer = iStartPage To iEndPage
                If bIsFirst Then
                    bIsFirst = False
                    currentDocument = New Document(reader.GetPageSizeWithRotation(1))
                    pdfWriter = New PdfSmartCopy(currentDocument, New System.IO.FileStream(System.IO.Path.GetDirectoryName(sInFile) & "\" & sSplitName, System.IO.FileMode.Create))
                    pdfWriter.SetFullCompression()
                    ' pdfWriter.CompressionLevel = PdfStream.BEST_COMPRESSION
                    pdfWriter.PdfVersion = reader.PdfVersion
                    currentDocument.Open()
                End If
                 importedPage = pdfWriter.GetImportedPage(reader, j)
                pdfWriter.AddPage(importedPage)
            Next
             Dim bookMark As New ArrayList
            bookMark = SimpleBookmark.GetBookmark(reader)
          
            If bookMark IsNot Nothing Then
                SimpleBookmark.EliminatePages(bookMark, New Integer() {iEndPage + 1, reader.NumberOfPages})
                If iStartPage > 1 Then
                    SimpleBookmark.EliminatePages(bookMark, New Integer() {1, iStartPage})
                    SimpleBookmark.ShiftPageNumbers(bookMark, -(iStartPage - 1), Nothing)
                End If
                pdfWriter.Outlines = bookMark
            End If
            currentDocument.Close()
            pdfWriter.Close()
            Return True
        Catch ex As Exception
        End Try
        Return False
    End Function
> 
> this one working fine.. and the pdf extracting with actual bookmarks..
> 
> ...


Stanav, was there and update for the question about child bookmarks?

----------


## vijy

> Hey Stanav,
> 
> Which method in your class, if any, can be used to extract bookmark info from a pdf?
> 
> thanks
> Brian


iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader) 

see post   #14: 

it will help..

----------


## kakahappns

> iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader) 
> 
> see post   #14: 
> 
> it will help..


vijy,

I have been trying to get starting and ending page numbers for the lowest level bookmarks that exist, and then use them to extract those pages to a new PDF.  I am able to get to the 1st level information, but I can not figure out how to dive deeper into the bookmark to get the kids.  

i.e.
1st level bookmark 1
page number
  2nd level bookmark 1
  page number - 1st starting pdf extract page  ** this is what I cannot figure out how to get to 
     3rd level bookmark 1
      page number - 1st ending pdf extract page, also the 2nd starting pdf extract page  ** this is what I cannot figure out how to get to
1st level bookmark 2
page number - 2nd ending pdf extract page

----------


## tbird888

Stanav,
Have you had any experience with the new PDF Portfolios (also known as Portable Collections)? I've been having trouble extracting more than the first page using iTextSharp (only recognizes the first page). Any thoughts?

----------


## GeoCrystal

Hi Stanav, 
May be a stupid doubt, but.....
Can I use the same code for a web application? coz, I think iTextSharp does not work on relative paths....

----------


## stanav

> Hi Stanav, 
> May be a stupid doubt, but.....
> Can I use the same code for a web application? coz, I think iTextSharp does not work on relative paths....


Yes, you certainly can. You don't use the relative paths. However, you can get the physical path from a relative path using Server.MapPath method in asp.net or something equivalent in other platforms.

----------


## rktech

> This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):
> 
> 1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.
> 
> 2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress 
> 
> _Please verify the version of iTextSharp you're using and download the correct class._
> 
> Below is the list of public methods in the new PdfManipulation2 class
> ...


Hello Stanav,

First of all thanks for your valuable post. It helped me a lot in few fuzzy things.
I am using the itextsharp 5.1.1.0 dll version along with your PDFManipulation2.vb code.

I want to remove restrictions of the PDF File which has 128 bit AES encryption. But i only have user password & do not have owner password.

For that i tried your method named "RemoveRestrictions" but it's throwing exception stating "Bad User Password". So could you please help me in this ?

Thanks in advance...

----------


## stanav

> Hello Stanav,
> 
> First of all thanks for your valuable post. It helped me a lot in few fuzzy things.
> I am using the itextsharp 5.1.1.0 dll version along with your PDFManipulation2.vb code.
> 
> I want to remove restrictions of the PDF File which has 128 bit AES encryption. But i only have user password & do not have owner password.
> 
> For that i tried your method named "RemoveRestrictions" but it's throwing exception stating "Bad User Password". So could you please help me in this ?
> 
> Thanks in advance...


I've made some changes in the RemoveRestrictions method to allow a password to be passed in if the restricted pdf is password protected. You need to re-download the PdfManipulation2.vb class.

----------


## rktech

> I've made some changes in the RemoveRestrictions method to allow a password to be passed in if the restricted pdf is password protected. You need to re-download the PdfManipulation2.vb class.


================
Stanav,

Thanks for your reply. I tried the same steps mentioned by you. But still its giving "Bad User Password" when i am passing the User password , not the Owner password, as my requirements are to remove restrictions with user password not owner password.

Your help in achieving same will be great.

Thanks in advance....

----------


## stanav

It should work with user password. I've tried it myself.
Can you upload a sample password protected pdf and give me the user password?
It could be the version of iTextSharp you're using (5.1.1.0). I was testing it using the older version 5.0.2.0.

----------


## rktech

Dear Stanav,

Thanks for your valuable reply.
Yes i tried & it worked.I was doing some mistake earlier.
But thanks a ton to you for your valuable support.


Also i am stuck in some other dot net setup problem as mentioned below. Can you share your update in same if possible ?

i have 1 .net windows application & same needs to be distributed so for that i had made setup & deployment project that has created 3 files as below.

a) MSI installer file.
b) Setup file.
c) dotnetfx30 file as dotnet is prerequisite & required this application to work so it needs to be installed 
    first. 

Target Area is to have single .Exe file instead of above 3 files. How can we accomplish this ?
Please help...

----------


## podpolanec

Thank you stanav, itextsharp.dll ver. 5.1.2.0 works great with your class PdfManipulation2.vb 
at first atempt without any problem. So, is there any way to donate you?

----------


## stanav

> Thank you stanav, itextsharp.dll ver. 5.1.2.0 works great with your class PdfManipulation2.vb 
> at first atempt without any problem. So, is there any way to donate you?


I'm glad that the code I shared here has helped you. And thank you for your offer but it's my philosophy not to accept any monetary offers for helping others.
You can rep me though  :Wink: 

Happy coding,
Stanav.

----------


## rktech

Dear Stanav,

Thanks for your valuable post that you have shared on this forum.
I have one scenario where your input is highly required i.e. i have tried your valuable code for removing the PDF restrictions whereas the encryption level were of 128 Bit Aes type with itextsharp dll version 4.0.3 & it worked awesome.

But when i am trying to remove pdf restrictions where the encryption type is 256 Bit aes then it throwing error & getting failed in removing restrictions.

Can you please help me in this ?

Thanks in advance.

==================================================



> This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):
> 
> 1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.
> 
> 2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress 
> 
> _Please verify the version of iTextSharp you're using and download the correct class._
> 
> Below is the list of public methods in the new PdfManipulation2 class
> ...

----------


## Skylimitz

Hi, Friends 

I am  new to this forum please help me with my problem both source and output pdf files are attached please help me producing the output file as i have more than 500 pages in one source file and more than 20 pdf files.

Its Very urgent


- Thanks in Advance

----------


## stanav

> Hi, Friends 
> 
> I am  new to this forum please help me with my problem both source and output pdf files are attached please help me producing the output file as i have more than 500 pages in one source file and more than 20 pdf files.
> 
> Its Very urgent
> 
> 
> - Thanks in Advance


Are you asking me to do the work for you? Sorry, but it doesn't work that way. If you have a coding question than I'll be more than happy to try to give you an answer if I can.

----------


## Skylimitz

Thanks for reply 

but please let me know about some technique which can help me doing so please.. help me i don't want u to do the coding but u can post some techniques which can help

thanks in advance

----------


## stanav

Based on the sample pdf files you posted, there is no easy way you can split the source file to get the output you want. You are trying to split the data in 1 pdf page into several pages. To do this, you will need to read the source file data (if possible) and the create new output pdf files on the fly.

----------


## rktech

Dear Stanav,

Thanks for the valuable post that helped me in my projects.

I need 1 more input from your side i.e. the way out for removing the PDF file 
restrictions which is encrypted with 256 bit encryption with user and/or owner password.

Really looking for your expert reply on this.

thanks n regards....
RkTech
==========================================================



> This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):
> 
> 1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.
> 
> 2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress 
> 
> _Please verify the version of iTextSharp you're using and download the correct class._
> 
> Below is the list of public methods in the new PdfManipulation2 class
> ...

----------


## stanav

> Dear Stanav,
> 
> Thanks for the valuable post that helped me in my projects.
> 
> I need 1 more input from your side i.e. the way out for removing the PDF file 
> restrictions which is encrypted with 256 bit encryption with user and/or owner password.
> 
> Really looking for your expert reply on this.
> 
> ...


As far as I know, the current version of iTextSharp doesn't support AES 256 encryption. Hopefully the developers will implement it in the next release of iTextSharp.

----------


## rktech

thanx 4 ur reply stanav. itext 5.1.1 supports 256 bit encryption.




> As far as I know, the current version of iTextSharp doesn't support AES 256 encryption. Hopefully the developers will implement it in the next release of iTextSharp.

----------


## stanav

> thanx 4 ur reply stanav. itext 5.1.1 supports 256 bit encryption.


Oh, really? I didn't know that. I'm still on 5.0.2 (no immediate need to upgrade yet). 
I still wonder though, if that is the case as you said 5.1.1 supports 256 AES then why wouldn't it work? Have you tried using UTF-8 encoder to convert the password to bytes array and then passing it to the constructor of the pdfreader class? In my code, I was using the system default encoder, which might not be UTF8 in other regions.

----------


## FullQuiver

Hi stanav,

Thank you for this information.  It has been very helpful.

Can the SplitPdfByParts method split a single sheet into 4 seperate sheets by dividing the original sheet into 4 equal quadrants?  What I mean by this is down the middle horizontally and vertically.

Thanks.

----------


## stanav

> Hi stanav,
> 
> Thank you for this information.  It has been very helpful.
> 
> Can the SplitPdfByParts method split a single sheet into 4 seperate sheets by dividing the original sheet into 4 equal quadrants?  What I mean by this is down the middle horizontally and vertically.
> 
> Thanks.


It's much easier to do it the other way around (merging 4 pages into 1 page) :Wink: 
As for your question, it depends on what kind of data you have in the source pdf page. If you can somehow extract the data from the page, you can then can use that data to create 4 new pages in a new pdf document.

----------


## FullQuiver

Thank you for your quick reply.  I have multi page pdf's where some of the pages look like the example provided.  I need to split those 4 blocks off into individual sheets.

Thanks for the help.

----------


## rktech

Yes Stanav 5.1.1 supports 256 bit AES.  As i treat you as an expert in this PDF subject so it would be great if you try your encoding permutation & combinations in same, as i do have less knowledge in same area.

Also please lemme know in case of any input or discussion.

Thanks....
Take Care.. 
============



> Oh, really? I didn't know that. I'm still on 5.0.2 (no immediate need to upgrade yet). 
> I still wonder though, if that is the case as you said 5.1.1 supports 256 AES then why wouldn't it work? Have you tried using UTF-8 encoder to convert the password to bytes array and then passing it to the constructor of the pdfreader class? In my code, I was using the system default encoder, which might not be UTF8 in other regions.

----------


## stanav

> Yes Stanav 5.1.1 supports 256 bit AES.  As i treat you as an expert in this PDF subject so it would be great if you try your encoding permutation & combinations in same, as i do have less knowledge in same area.
> 
> Also please lemme know in case of any input or discussion.
> 
> Thanks....
> Take Care.. 
> ============


I downloaded the newest version of itextsharp 5.1.3 to try out and I can't seem to find any evidence of it supporting AES_256 encryption. If you look at all the encryption constants in the PdfEncryption class, there are only 3: standard_encryption_40, standard_encryption_128 and aes_128. If it supported aes 256, I should've seen the declaration of aes_256 constant.
Can you provide some links/source where it is confirmed that aes_256 is supported?

----------


## rktech

Dear Stanav,

I have checked same in itext 5.1.1 & found PdfWriter.ENCRYPTION_AES_256 in the PdfWriter class.

Please lemme know for any additional information.

thanks,
RkTech  

===============



> I downloaded the newest version of itextsharp 5.1.3 to try out and I can't seem to find any evidence of it supporting AES_256 encryption. If you look at all the encryption constants in the PdfEncryption class, there are only 3: standard_encryption_40, standard_encryption_128 and aes_128. If it supported aes 256, I should've seen the declaration of aes_256 constant.
> Can you provide some links/source where it is confirmed that aes_256 is supported?

----------


## stanav

> Dear Stanav,
> 
> I have checked same in itext 5.1.1 & found PdfWriter.ENCRYPTION_AES_256 in the PdfWriter class.
> 
> Please lemme know for any additional information.
> 
> thanks,
> RkTech  
> 
> ===============


Dear RkTech,
After I saw your mentioning of PdfWriter.ENCRYPTION_AES_256 constant, I took a closer look at the PdfReader class source code and you're right, it does support AES256. According to Adobe's PDF specification, it uses a new algorithm, 3.2a, to compute the encryption key. This algorithm uses the 1st 127 bytes from an utf-8 password (anything longer gets truncated) + 8 bytes salt + 48 bytes U string and then computes the aes-256 hash of the result to be used as the password bytes array...
Anyway, I'd give it a try if you can provide me a sample aes-256 password protected pdf.
Happy Holidays.....

----------


## stanav

Dear RkTech,
I've made some changes to the PdfManipulation2 class and it now supports opening pdf files with AES-256 password protection. Please download the new version and try it out.

Happy Holidays.
Stanav.

----------


## Skylimitz

Hi, every one i want to write a program in VB .NET for find the invalid / unreferenced  Keys in Registry  and fix them please any one tell me the technique by which i can do this stuff or any one can feel free to provide me the   Little code 

Thanks in Advance

----------


## jjsaw5

Hey Stan, 

I have an access database that creates a PDF report. In this report each page is a separate invoice. Each invoice is the same layout only the data on the pages differ.  What i'd like to do is take this PDF report and split each invoice into it's own separate pdf file. Is this something i can use what you have provided us with  to do? If so do you have any suggestions on what i can use to complete this. 

Thanks very much for you time!

----------


## stanav

> Hey Stan, 
> 
> I have an access database that creates a PDF report. In this report each page is a separate invoice. Each invoice is the same layout only the data on the pages differ.  What i'd like to do is take this PDF report and split each invoice into it's own separate pdf file. Is this something i can use what you have provided us with  to do? If so do you have any suggestions on what i can use to complete this. 
> 
> Thanks very much for you time!


If all the invoices are in the same format (same number of pages) and you know before hand how many pages each individual rinvoice should have, you can use the PdfManipulation2.SplitPdfByPages method which can be found in the PdfManipulation2 class. On the other hand, if the number of pages in each invoice differs, as long as you know the number of pages for each of them, you can modify the above mentioned method a little to have it split the pdf the way you want (it tough to find out how many pages each invoice has though).
Best luck.

----------


## Maverik

Hi stanav and thank you for your work
I want to connect to an encrypted pdf (with digital certificate). 
Can you post a sample of how to do it? 

Thank you!

----------


## StanMan

Hi All,

I was wondering if any of you can help me out with my problem - I am calling iTextSharp from VB .Net windows form to create PDF report.

Ho can I -

1. Create multiple headers(header1, header 2 etc, logo)
2. Multiple footers - Page X of Y, some other text etc.
3. Positioning data on PDF from SQL Server.

I would really appreciate if any kind soul can give me some direction or code sample.

Thanks in advance.

StanMan

----------


## jptillman

I'm using PdfManipulation2 with iTextSharp 5.2 and it appears that ExtractPdfPage() may have a significant bug.

When I perform the following operations on a 3 page document with each page containing a single TIF image (such as a 3-page fax converted to PDF):

PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 1, "c:\temp\test1.pdf")
PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 2, "c:\temp\test2.pdf")
PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 3, "c:\temp\test3.pdf")

then...
test1.pdf will be approx 1/3 the size of test.pdf
test2.pdf will be approx 2/3 the size of test.pdf
test3.pdf will be just shy of the full size of test.pdf

When viewed in a PDF viewer, each extracted document will show the proper single page and nothing else.  Everything looks fine.

If i open up test3.pdf and look for the /CCITTFAXDECODE entries, I find 3 of them, one for each page of the original PDF.

Based on these results, it appears that the ExtractPdfPage function (or the iTextSharp code that it uses) is failing to remove vestiges of the pages which precede the extracted page -- specifically the images they contain.

Is this bug happening for anyone else?  If so, does anyone know how to fix it?

----------


## stanav

> I'm using PdfManipulation2 with iTextSharp 5.2 and it appears that ExtractPdfPage() may have a significant bug.
> 
> When I perform the following operations on a 3 page document with each page containing a single TIF image (such as a 3-page fax converted to PDF):
> 
> PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 1, "c:\temp\test1.pdf")
> PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 2, "c:\temp\test2.pdf")
> PdfManipulation2.ExtractPdfPage("c:\temp\test.pdf", 3, "c:\temp\test3.pdf")
> 
> then...
> ...


Thanks for raising up the issue. Currently, I don't have much free time to investigate the bug. I'll look into it when I have a chance.

Happy coding.
Stanav.

----------


## kareninstructor

Just used the code for a post over at the MS Social Forums, works like a charm.

----------


## Invincible_arya

Hey Stanav

I am first time working on PDF documents.I was searching on Rotation of a PDF page. I got some code samples regarding rotations by 90,180,270,360 degrees.

But my problem is that if there is a very small rotation required of the order of few degrees .....
Is there any way that we can specify an angle using a slider bar or use the mouse to click and rotate the PDF page...

Thanks in Advance 

Regards
Aryan

----------


## sumogan

PDF Manipulation2 is a nice bundle Class, Kindly include the PFX signing function which could add one more feather to the class.

SunilMogan

----------


## M@dH@tter

I'm getting confused really bad on this :
 All i need to do is get the bookmarks and the pdf file names they point to.
all of the bookmarks point to other pdf files which should be on the HD.however i've noticed that at the time i downloaded these files some of the files they were supposed have were not on the server..or i didn't download em  or something..

so now i have 100's of files with missing pdf's..too many to go thru and correct manually.
So i thought that i could extract the bookmarks and the files they point to..then take those file names add the web address for each one(all located on the same server) and make a download list.
So i did this


```
Option Explicit On

Public Class Form1
    'Specified the path to the source pdf file
    Dim sourcePdf As String = "E:\pdfs\1.pdf"

  'save as XML file
    Dim outputXML As String = "E:\1.xml"


    Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
        Dim result As Boolean = False
        Try
            Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
            Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
            Using outFile As New IO.StreamWriter(outputXML)
                SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
            End Using
            reader.Close()
            result = True
        Catch ex As Exception
            Throw New ApplicationException(ex.Message, ex)
        End Try
        Return result
    End Function

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        ExportBookmarksToXML(sourcePdf, outputXML)
    End Sub
End Class
```

i get errors saying this is not declared so i searched thru the itextSharp zip file and found the ones that were missing..and those started saying this was missing ,that was missing ,this wasn't declared,,it never ended..
 it was like i needed to have every single one of those class files in my project??

I had to convert those to vb.net tho..luckily i found a bulk converter which worked online.

So i'm gonna have to ask if someone could PLEASE tell me how i can fix this so i can get it to do what i need it to..i'm going around in circles here,,, or even better still, put all the subs that i need into one module file for me.

Thanks

----------


## M@dH@tter

well at the time i posted this i didnt know there was a dll hehe i was trying to add all the source code into the project..but now that i've added the .dll it's working good..how can i export to a txt file tho..not xml?

----------


## M@dH@tter

Different question about this ..
is it possible to use this to edit bookmarks..
so if my main pdf file has bookmarks to say \1.pdf, \2.pdf. \3.pdf etc,..
and i wanted to put the bookmarked files into a separate folder instead of having them in with the main pdf file..can i alter the bookmarks to reflect the new directory where they can be found.
like     \temp\1.pdf

????

----------


## stanav

> Different question about this ..
> is it possible to use this to edit bookmarks..
> so if my main pdf file has bookmarks to say \1.pdf, \2.pdf. \3.pdf etc,..
> and i wanted to put the bookmarked files into a separate folder instead of having them in with the main pdf file..can i alter the bookmarks to reflect the new directory where they can be found.
> like     \temp\1.pdf
> 
> ????


Yes, you can do that with iTextSharp. Bookmarks in pdf's are actually annotations, so you might want to look for examples of creating/editing annotations. I'm very busy at the moment and will not have time to write any example code for you. Sorry for that.

----------


## M@dH@tter

thanks for the info,,,I'll check into that

----------


## doug_ecg

I know this thread is a little old but I am hoping to get some help !

I need to remove the restrictions from a PDF that is automatically generated from one of our systems. The software generating the PDF also generates a random 7 character password as the owner password which changes for each file. The user password is blank.

I need to be able to change the files metadata to allow our PDF store to index the documents appropriately. 

If i use the restrictions remover in pdfmaniupation2.vb ( a great bit of code btw) then it does not remove the owner password, but does change all the permissions listed to allowed (apart from page extraction)

When I use my code to change the metadata I get an exception "PdfReader not opened with owner password"

my reading of the code in pdfmanipulation2.vb is that it should create a new PDf with the contents of the old, but it should have no encryption and have no restrictions - have I got this wrong ?, can anyone advise a better way of doing this ?

thanks

----------


## stanav

> ....................
> my reading of the code in pdfmanipulation2.vb is that it should create a new PDf with the contents of the old, but it should have no encryption and have no restrictions - have I got this wrong ?, can anyone advise a better way of doing this ?
> 
> thanks


Yes, you got the wrong idea... Removing restrictions means just that. It won't remove the passwords.

----------


## doug_ecg

> Yes, you got the wrong idea... Removing restrictions means just that. It won't remove the passwords.


Ah indeed !

Do you by any chance know of a way to alter the metadata without using the owner password ?

With restrictions removed is it possible to copy the contents to a new PDF ?

Thanks for your assistance !

B

----------


## stanav

Print it to a new PDF using a PDF print driver such as CutePDF... The printed version of the file (reads "new copy") won't have any passwords.

----------


## doug_ecg

Thanks - that does work, I was rather hoping to handle it all within my application.

Essentially I need to alter the document, add new metadata and then save it again. Rather irritatingly the company that produces the other piece of kit that generates the PDFs appears to use a random owner password ( and doesnt wish to change their software just so we can index the pdfs in ours)

Does anyone by any chance know of a way to achieve this ?, if not I guess I shall have to get cutepdf, capture the created file and then work with that.

----------


## stanav

> Thanks - that does work, I was rather hoping to handle it all within my application.
> 
> Essentially I need to alter the document, add new metadata and then save it again. Rather irritatingly the company that produces the other piece of kit that generates the PDFs appears to use a random owner password ( and doesnt wish to change their software just so we can index the pdfs in ours)
> 
> Does anyone by any chance know of a way to achieve this ?, if not I guess I shall have to get cutepdf, capture the created file and then work with that.


The custom version of Pdf Writer (reads "paid version") allows you to bypass the "save as" dialog window and thus you can silently print to pdf from your app.
More info can be found from their web site:
http://www.cutepdf.com/Solutions/pdfwriter.asp

Truly, if this is for bussiness, the one time $500 price tag of the "Custom PDF Writer with programmatic access" package is justifiable.

----------


## nbrege

Stanav ... your AddWatermarkText function requires both a source file and a destination file.  Is it possible to rewrite this function to require only one file?  In your current function, using the same file for both source & destination results in an error.  I just want to specify a filename and the watermark text & have the function add the text to that file.  Is this possible?

----------


## stanav

You have the source code and thus you can see how it works. If it doesn't meet your needs, feel free to modify it the way you want...
Whenever you edir a pdf file, it has to be saved as a new file due to 2 reasons: 1. pdf files are not designed to be editable. 2. The source pdf file is being opened (since you're using it to edit), and thus the file is locked. You can't delete it until the file is closed. Now that you know this, it should be fairly straight forward to modify the existing code to do what you want... That is, instead of passing in a output file path, you declare this variable locally and generate a random temporary file name for it. When done adding the watermarks, after closing the original file, you delete it and move the temp file to replace the original file.

----------


## deltawebdesigns

Stanav,

Great help in my project.  Just running into a couple of small errors and wondered if you could point me right direction in fixing them.

Error	1	Value of type '1-dimensional array of Byte' cannot be converted to 'iTextSharp.text.pdf.RandomAccessFileOrArray'.	C:\VB.Net\PDFMerge_Window\PDFMerge_Console\PdfManipulation2.vb	138	65	PDFMerge_Window

Error	2	'MessageBox' is not declared. It may be inaccessible due to its protection level.	C:\VB.Net\PDFMerge_Window\PDFMerge_Console\PdfManipulation2.vb	179	13	PDFMerge_Window

----------


## eastken

When I search tutorial on extracting PDF pages using iText, I see a links reprint this topic here:http://zh.scribd.com/doc/208204720/V...BForums#scribd
All post on it seems should be download to display.

----------


## kkrishnaji

hi @ stanav,

Thank you so much for your wonderful work.  

I have a excel file which contains: One column as name and Other column as Email. Also, I have number of pdf files in a folder.

Is it possible, to pick a pdf based on column Name and inser the email on that PDF 's first page?

It will be very helpful for me to skip from huge manual task.

Thank you so much in advance.

----------


## txgeekgirl2

> Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.
> 
> vb.net Code:
> ''' <summary>
    ''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
    ''' </summary>
    ''' <param name="sourcePdf">full path to sthe source pdf</param>
    ''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
    ''' <param name="outPdf">the full path to the output pdf</param>
    ''' <remarks></remarks>
    Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
         Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim outlines As System.Collections.ArrayList = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
        Dim hshTable As System.Collections.Hashtable = Nothing
        Try
            raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
            reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
            outlines = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)
            reader.SelectPages(New System.Collections.ArrayList(pageNumbersToExtract))
            stamper = New iTextSharp.text.pdf.PdfStamper(reader, New IO.FileStream(outPdf, IO.FileMode.Create))
            RemoveUnusedBookmarks(outlines, pageNumbersToExtract)
            stamper.Outlines = outlines
            stamper.Close()
            reader.Close()
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
    End Sub
     Private Shared Sub RemoveUnusedBookmarks(ByRef bookmarks As System.Collections.ArrayList, ByVal pagesToKeep() As Integer)
        Dim bookmark As System.Collections.Hashtable = Nothing
        Dim obj As Object = Nothing
        For i As Integer = bookmarks.Count - 1 To 0 Step -1
            obj = bookmarks(i)
            If TypeOf obj Is System.Collections.ArrayList Then
                RemoveUnusedBookmarks(DirectCast(obj, System.Collections.ArrayList), pagesToKeep)
            ElseIf TypeOf obj Is System.Collections.Hashtable Then
                bookmark = DirectCast(obj, System.Collections.Hashtable)
                If bookmark.ContainsKey("Page") Then
                    Dim value As String = DirectCast(bookmark.Item("Page"), String)
                    If Not String.IsNullOrEmpty(value) Then
                        Dim parts() As String = value.Split(" "c)
                        If parts.Length > 0 Then
                            Dim pageNum As Integer = -1
                            If Integer.TryParse(parts(0), pageNum) Then
                                Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
                                If idx < 0 Then
                                    bookmarks.Remove(obj)
                                Else
                                    parts(0) = (idx + 1).ToString
                                    value = String.Join(" ", parts)
                                    bookmark.Item("Page") = value
                                End If
                            End If
                        End If
                    End If
                End If
            End If
        Next
    End Sub
> 
> Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try 
> 
> Good luck.



THANK YOU Stanav - You got me further than I had been in two days.  

I tweaked it a bit in order to return actual page numbers so I can build a call for iTextSharp to recompile with only pages needed based on finding on a page. 



```
Public Shared Function SearchTextFromPdf(ByVal sourcePdf As String, ByVal searchPhrase As String) As Integer
        Dim foundList As New Integer
        Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing

        Try
            raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
            reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)

            For i As Integer = 1 To reader.NumberOfPages()
                Dim pageText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i)

                If pageText.Contains(searchPhrase) Then
                    Return i
                    MessageBox.Show(i.ToString)
                    Exit Function
                End If
            Next

            reader.Close()
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return 0
    End Function
```

----------


## vswargam

Hi , I am trying to create PDF file from the text file using Itext . i am facing some hard time to figure out how get the data without missing the format in text file.

Readline, ReadtoEnd are not providing me what i want. 
Can you please guide me in achieving the  task?

Thanks

----------


## stanav

> Hi , I am trying to create PDF file from the text file using Itext . i am facing some hard time to figure out how get the data without missing the format in text file.
> 
> Readline, ReadtoEnd are not providing me what i want. 
> Can you please guide me in achieving the  task?
> 
> Thanks


Easiest method is to use a PDF print driver, such as Cute PDF... All you have to do is to use the default app for that document type (i.e. Notepad for .txt, MS Word for .docx...) to open the file and send print command to it, specifying the pdf print driver as the printer. So basically, instead of printing out on paper, it now creates a pdf file for you.

----------


## vswargam

Thank you very much . But I have to insert logo on top of the file along with the conversion , for the purpose itext seems the right fit.

----------


## nagu0006

> Stanav ... thanks for posting these code samples.  They helped me on a project that I am currently working on.  I would like to request that you post another sample:  I need to be able to extract specified pages from multiple documents & save them to one combined PDF.  ie. take pages 3 & 7 from Doc1.pdf, 4-6 from Doc2.pdf & 1, 5 & 12 from Doc3.pdf and save them in Doc4.pdf  Is this "do-able"?



Hi,

If got solution for same kindly share code..

Thanks in Advance  ..

----------

