Class PDDocument

  • All Implemented Interfaces:
    java.awt.print.Pageable, java.io.Closeable, java.lang.AutoCloseable
    Direct Known Subclasses:
    ConformingPDDocument

    public class PDDocument
    extends java.lang.Object
    implements java.awt.print.Pageable, java.io.Closeable
    This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!

    This class implements the Pageable interface, but since PDFBox version 1.3.0 you should be using the PDPageable adapter instead (see PDFBOX-788).

    Version:
    $Revision: 1.47 $
    Author:
    Ben Litchfield
    • Constructor Detail

      • PDDocument

        public PDDocument()
        Constructor, creates a new PDF Document with no pages. You need to add at least one page for the document to be valid.
      • PDDocument

        public PDDocument​(COSDocument doc)
        Constructor that uses an existing document. The COSDocument that is passed in must be valid.
        Parameters:
        doc - The COSDocument that this document wraps.
      • PDDocument

        public PDDocument​(COSDocument doc,
                          BaseParser usedParser)
        Constructor that uses an existing document. The COSDocument that is passed in must be valid.
        Parameters:
        doc - The COSDocument that this document wraps.
        usedParser - the parser which is used to read the pdf
    • Method Detail

      • getPageMap

        public final java.util.Map<java.lang.String,​java.lang.Integer> getPageMap()
        This will return the Map containing the mapping from object-ids to pagenumbers.
        Returns:
        the pageMap
      • addPage

        public void addPage​(PDPage page)
        This will add a page to the document. This is a convenience method, that will add the page to the root of the hierarchy and set the parent of the page to the root.
        Parameters:
        page - The page to add to the document.
      • addSignature

        public void addSignature​(PDSignature sigObject,
                                 SignatureInterface signatureInterface)
                          throws java.io.IOException,
                                 SignatureException
        Add a signature.
        Parameters:
        sigObject - is the PDSignature model
        signatureInterface - is a interface which provides signing capabilities
        Throws:
        java.io.IOException - if there is an error creating required fields
        SignatureException - if something went wrong
      • addSignature

        public void addSignature​(PDSignature sigObject,
                                 SignatureInterface signatureInterface,
                                 SignatureOptions options)
                          throws java.io.IOException,
                                 SignatureException
        This will add a signature to the document.
        Parameters:
        sigObject - is the PDSignature model
        signatureInterface - is a interface which provides signing capabilities
        options - signature options
        Throws:
        java.io.IOException - if there is an error creating required fields
        SignatureException - if something went wrong
      • addSignatureField

        public void addSignatureField​(java.util.List<PDSignatureField> sigFields,
                                      SignatureInterface signatureInterface,
                                      SignatureOptions options)
                               throws java.io.IOException,
                                      SignatureException
        This will add a signaturefield to the document.
        Parameters:
        sigFields - are the PDSignatureFields that should be added to the document
        signatureInterface - is a interface which provides signing capabilities
        options - signature options
        Throws:
        java.io.IOException - if there is an error creating required fields
        SignatureException
      • removePage

        public boolean removePage​(PDPage page)
        Remove the page from the document.
        Parameters:
        page - The page to remove from the document.
        Returns:
        true if the page was found false otherwise.
      • removePage

        public boolean removePage​(int pageNumber)
        Remove the page from the document.
        Parameters:
        pageNumber - 0 based index to page number.
        Returns:
        true if the page was found false otherwise.
      • importPage

        public PDPage importPage​(PDPage page)
                          throws java.io.IOException
        This will import and copy the contents from another location. Currently the content stream is stored in a scratch file. The scratch file is associated with the document. If you are adding a page to this document from another document and want to copy the contents to this document's scratch file then use this method otherwise just use the addPage method. Unlike addPage(org.apache.pdfbox.pdmodel.PDPage), this method does a deep copy. If your page has annotations, and if these link to pages not in the target document, then the target document might become huge. What you need to do is to delete page references of such annotations. See here for how to do this.
        Parameters:
        page - The page to import.
        Returns:
        The page that was imported.
        Throws:
        java.io.IOException - If there is an error copying the page.
      • getDocument

        public COSDocument getDocument()
        This will get the low level document.
        Returns:
        The document that this layer sits on top of.
      • getDocumentInformation

        public PDDocumentInformation getDocumentInformation()
        This will get the document info dictionary. This is guaranteed to not return null.
        Returns:
        The documents /Info dictionary
      • setDocumentInformation

        public void setDocumentInformation​(PDDocumentInformation info)
        This will set the document information for this document.
        Parameters:
        info - The updated document information.
      • getDocumentCatalog

        public PDDocumentCatalog getDocumentCatalog()
        This will get the document CATALOG. This is guaranteed to not return null.
        Returns:
        The documents /Root dictionary
      • isEncrypted

        public boolean isEncrypted()
        This will tell if this document is encrypted or not.
        Returns:
        true If this document is encrypted.
      • getEncryptionDictionary

        public PDEncryptionDictionary getEncryptionDictionary()
                                                       throws java.io.IOException
        This will get the encryption dictionary for this document. This will still return the parameters if the document was decrypted. If the document was never encrypted then this will return null. As the encryption architecture in PDF documents is plugable this returns an abstract class, but the only supported subclass at this time is a PDStandardEncryption object.
        Returns:
        The encryption dictionary(most likely a PDStandardEncryption object)
        Throws:
        java.io.IOException - If there is an error determining which security handler to use.
      • setEncryptionDictionary

        public void setEncryptionDictionary​(PDEncryptionDictionary encDictionary)
                                     throws java.io.IOException
        This will set the encryption dictionary for this document.
        Parameters:
        encDictionary - The encryption dictionary(most likely a PDStandardEncryption object)
        Throws:
        java.io.IOException - If there is an error determining which security handler to use.
      • getSignatureDictionary

        @Deprecated
        public PDSignature getSignatureDictionary()
                                           throws java.io.IOException
        Deprecated.
        This will return the last signature.
        Returns:
        the last signature as PDSignature.
        Throws:
        java.io.IOException - if no document catalog can be found.
      • getLastSignatureDictionary

        public PDSignature getLastSignatureDictionary()
                                               throws java.io.IOException
        This will return the last signature.
        Returns:
        the last signature as PDSignature.
        Throws:
        java.io.IOException - if no document catalog can be found.
      • getSignatureFields

        public java.util.List<PDSignatureField> getSignatureFields()
                                                            throws java.io.IOException
        Retrieve all signature fields from the document.
        Returns:
        a List of PDSignatureFields
        Throws:
        java.io.IOException - if no document catalog can be found.
      • getSignatureDictionaries

        public java.util.List<PDSignature> getSignatureDictionaries()
                                                             throws java.io.IOException
        Retrieve all signature dictionaries from the document.
        Returns:
        a List of PDSignatures
        Throws:
        java.io.IOException - if no document catalog can be found.
      • isUserPassword

        @Deprecated
        public boolean isUserPassword​(java.lang.String password)
                               throws java.io.IOException,
                                      CryptographyException
        Deprecated.
        This will determine if this is the user password. This only applies when the document is encrypted and uses standard encryption.
        Parameters:
        password - The plain text user password.
        Returns:
        true If the password passed in matches the user password used to encrypt the document.
        Throws:
        java.io.IOException - If there is an error determining if it is the user password.
        CryptographyException - If there is an error in the encryption algorithms.
      • isOwnerPassword

        @Deprecated
        public boolean isOwnerPassword​(java.lang.String password)
                                throws java.io.IOException,
                                       CryptographyException
        Deprecated.
        This will determine if this is the owner password. This only applies when the document is encrypted and uses standard encryption.
        Parameters:
        password - The plain text owner password.
        Returns:
        true If the password passed in matches the owner password used to encrypt the document.
        Throws:
        java.io.IOException - If there is an error determining if it is the user password.
        CryptographyException - If there is an error in the encryption algorithms.
      • decrypt

        public void decrypt​(java.lang.String password)
                     throws CryptographyException,
                            java.io.IOException
        This will decrypt a document. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.

        Do not call this method if you have opened your document with one of the loadNonSeq methods.

        Parameters:
        password - Either the user or owner password.
        Throws:
        CryptographyException - If there is an error decrypting the document.
        java.io.IOException - If there is an error getting the stream data.
      • wasDecryptedWithOwnerPassword

        @Deprecated
        public boolean wasDecryptedWithOwnerPassword()
        Deprecated.
        use getCurrentAccessPermission instead
        This will tell if the document was decrypted with the master password. This entry is invalid if the PDF was not decrypted.
        Returns:
        true if the pdf was decrypted with the master password.
      • encrypt

        public void encrypt​(java.lang.String ownerPassword,
                            java.lang.String userPassword)
                     throws CryptographyException,
                            java.io.IOException
        This will mark a document to be encrypted. The actual encryption will occur when the document is saved. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.
        Parameters:
        ownerPassword - The owner password to encrypt the document.
        userPassword - The user password to encrypt the document.
        Throws:
        CryptographyException - If an error occurs during encryption.
        java.io.IOException - If there is an error accessing the data.
      • getOwnerPasswordForEncryption

        @Deprecated
        public java.lang.String getOwnerPasswordForEncryption()
        Deprecated.
        Do not rely on this method anymore.
        The owner password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.
        Returns:
        The owner password passed to the encrypt method.
      • getUserPasswordForEncryption

        @Deprecated
        public java.lang.String getUserPasswordForEncryption()
        Deprecated.
        Do not rely on this method anymore.
        The user password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.
        Returns:
        The user password passed to the encrypt method.
      • willEncryptWhenSaving

        @Deprecated
        public boolean willEncryptWhenSaving()
        Deprecated.
        Do not rely on this method anymore. It is the responsibility of COSWriter to hold this state
        Internal method do determine if the document will be encrypted when it is saved.
        Returns:
        True if encrypt has been called and the document has not been saved yet.
      • clearWillEncryptWhenSaving

        @Deprecated
        public void clearWillEncryptWhenSaving()
        Deprecated.
        Do not rely on this method anymore. It is the responsability of COSWriter to hold this state.
        This shoule only be called by the COSWriter after encryption has completed.
      • load

        public static PDDocument load​(java.net.URL url)
                               throws java.io.IOException
        This will load a document from a url.
        Parameters:
        url - The url to load the PDF from.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.net.URL url,
                                      boolean force)
                               throws java.io.IOException
        This will load a document from a url. Used for skipping corrupt pdf objects
        Parameters:
        url - The url to load the PDF from.
        force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.net.URL url,
                                      RandomAccess scratchFile)
                               throws java.io.IOException
        This will load a document from a url.
        Parameters:
        url - The url to load the PDF from.
        scratchFile - A location to store temp PDFBox data for this document.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.lang.String filename)
                               throws java.io.IOException
        This will load a document from a file.
        Parameters:
        filename - The name of the file to load.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.lang.String filename,
                                      boolean force)
                               throws java.io.IOException
        This will load a document from a file. Allows for skipping corrupt pdf objects
        Parameters:
        filename - The name of the file to load.
        force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.lang.String filename,
                                      RandomAccess scratchFile)
                               throws java.io.IOException
        This will load a document from a file.
        Parameters:
        filename - The name of the file to load.
        scratchFile - A location to store temp PDFBox data for this document.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.File file)
                               throws java.io.IOException
        This will load a document from a file.
        Parameters:
        file - The name of the file to load.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.File file,
                                      RandomAccess scratchFile)
                               throws java.io.IOException
        This will load a document from a file.
        Parameters:
        file - The name of the file to load.
        scratchFile - A location to store temp PDFBox data for this document.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.InputStream input)
                               throws java.io.IOException
        This will load a document from an input stream.
        Parameters:
        input - The stream that contains the document.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.InputStream input,
                                      boolean force)
                               throws java.io.IOException
        This will load a document from an input stream. Allows for skipping corrupt pdf objects
        Parameters:
        input - The stream that contains the document.
        force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.InputStream input,
                                      RandomAccess scratchFile)
                               throws java.io.IOException
        This will load a document from an input stream.
        Parameters:
        input - The stream that contains the document.
        scratchFile - A location to store temp PDFBox data for this document.
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • load

        public static PDDocument load​(java.io.InputStream input,
                                      RandomAccess scratchFile,
                                      boolean force)
                               throws java.io.IOException
        This will load a document from an input stream. Allows for skipping corrupt pdf objects
        Parameters:
        input - The stream that contains the document.
        scratchFile - A location to store temp PDFBox data for this document.
        force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
        Returns:
        The document that was loaded.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • loadNonSeq

        public static PDDocument loadNonSeq​(java.io.File file,
                                            RandomAccess scratchFile)
                                     throws java.io.IOException
        Parses PDF with the new non sequential parser and an empty password.
        Parameters:
        file - file to be loaded
        scratchFile - location to store temp PDFBox data for this document
        Returns:
        loaded document
        Throws:
        java.io.IOException - in case of a file reading or parsing error
      • loadNonSeq

        public static PDDocument loadNonSeq​(java.io.File file,
                                            RandomAccess scratchFile,
                                            java.lang.String password)
                                     throws java.io.IOException
        Parses PDF with the new non sequential parser and an empty password.
        Parameters:
        file - file to be loaded
        scratchFile - location to store temp PDFBox data for this document
        password - password to be used for decryption
        Returns:
        loaded document
        Throws:
        java.io.IOException - in case of a file reading or parsing error
      • loadNonSeq

        public static PDDocument loadNonSeq​(java.io.InputStream input,
                                            RandomAccess scratchFile)
                                     throws java.io.IOException
        Parses PDF with the new non sequential parser.
        Parameters:
        input - stream that contains the document.
        scratchFile - location to store temp PDFBox data for this document
        Returns:
        loaded document
        Throws:
        java.io.IOException - in case of a file reading or parsing error
      • loadNonSeq

        public static PDDocument loadNonSeq​(java.io.InputStream input,
                                            RandomAccess scratchFile,
                                            java.lang.String password)
                                     throws java.io.IOException
        Parses PDF with the new non sequential parser.
        Parameters:
        input - stream that contains the document.
        scratchFile - location to store temp PDFBox data for this document
        password - password to be used for decryption
        Returns:
        loaded document
        Throws:
        java.io.IOException - in case of a file reading or parsing error
      • save

        public void save​(java.lang.String fileName)
                  throws java.io.IOException,
                         COSVisitorException
        Save the document to a file.
        Parameters:
        fileName - The file to save as.
        Throws:
        java.io.IOException - If there is an error saving the document.
        COSVisitorException - If an error occurs while generating the data.
      • save

        public void save​(java.io.File file)
                  throws java.io.IOException,
                         COSVisitorException
        Save the document to a file.
        Parameters:
        file - The file to save as.
        Throws:
        java.io.IOException - If there is an error saving the document.
        COSVisitorException - If an error occurs while generating the data.
      • save

        public void save​(java.io.OutputStream output)
                  throws java.io.IOException,
                         COSVisitorException
        This will save the document to an output stream.
        Parameters:
        output - The stream to write to.
        Throws:
        java.io.IOException - If there is an error writing the document.
        COSVisitorException - If an error occurs while generating the data.
      • saveIncremental

        public void saveIncremental​(java.lang.String fileName)
                             throws java.io.IOException,
                                    COSVisitorException
        Save the pdf as incremental for signing. Use this only for small files because this method temporarily stores the entire file into memory.
        Parameters:
        fileName - the filename to be used. This should be a copy of the original file.
        Throws:
        java.io.IOException - if something went wrong
        COSVisitorException - if something went wrong
      • saveIncremental

        public void saveIncremental​(java.io.InputStream input,
                                    java.io.OutputStream output)
                             throws java.io.IOException,
                                    COSVisitorException
        Save the pdf as incremental for signing. See the signature examples sources on how to use this.
        Parameters:
        input - . This must be a FileInputStream or it won't work. It should point to the same file than the output parameter.
        output - . This must be a FileOutputStream or it won't work. It must be positioned at the end of the file, i.e. it should just have written the original file. The appending constructor of FileOutputStream has been found not to be working, so you need to write the whole file yourself.
        Throws:
        java.io.IOException - if something went wrong
        COSVisitorException - if something went wrong
      • getPageCount

        @Deprecated
        public int getPageCount()
        Deprecated.
        Use the getNumberOfPages method instead!
        This will return the total page count of the PDF document. Note: This method is deprecated in favor of the getNumberOfPages method. The getNumberOfPages is a required interface method of the Pageable interface. This method will be removed in a future version of PDFBox!!
        Returns:
        The total number of pages in the PDF document.
      • getNumberOfPages

        public int getNumberOfPages()
        Specified by:
        getNumberOfPages in interface java.awt.print.Pageable
      • getPageFormat

        @Deprecated
        public java.awt.print.PageFormat getPageFormat​(int pageIndex)
        Deprecated.
        Use the PDPageable adapter class
        Returns the format of the page at the given index when using a default printer job returned by PrinterJob.getPrinterJob().
        Specified by:
        getPageFormat in interface java.awt.print.Pageable
        Parameters:
        pageIndex - page index, zero-based
        Returns:
        page format
      • getPrintable

        public java.awt.print.Printable getPrintable​(int pageIndex)
        Specified by:
        getPrintable in interface java.awt.print.Pageable
      • print

        public void print​(java.awt.print.PrinterJob printJob)
                   throws java.awt.print.PrinterException
        Parameters:
        printJob - The printer job.
        Throws:
        java.awt.print.PrinterException - If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.
        See Also:
        print()
      • print

        public void print()
                   throws java.awt.print.PrinterException
        This will send the PDF document to a printer. The printing functionality depends on the org.apache.pdfbox.pdfviewer.PageDrawer functionality. The PageDrawer is a work in progress and some PDFs will print correctly and some will not. This is a convenience method to create the java.awt.print.PrinterJob. The PDDocument implements the java.awt.print.Pageable interface and PDPage implementes the java.awt.print.Printable interface, so advanced printing capabilities can be done by using those interfaces instead of this method.
        Throws:
        java.awt.print.PrinterException - If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.
      • silentPrint

        public void silentPrint()
                         throws java.awt.print.PrinterException
        This will send the PDF to the default printer without prompting the user for any printer settings.
        Throws:
        java.awt.print.PrinterException - If there is an error while printing.
        See Also:
        print()
      • silentPrint

        public void silentPrint​(java.awt.print.PrinterJob printJob)
                         throws java.awt.print.PrinterException
        This will send the PDF to the default printer without prompting the user for any printer settings.
        Parameters:
        printJob - A printer job definition.
        Throws:
        java.awt.print.PrinterException - If there is an error while printing.
        See Also:
        print()
      • close

        public void close()
                   throws java.io.IOException
        This will close the underlying COSDocument object.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - If there is an error releasing resources.
      • getCurrentAccessPermission

        public AccessPermission getCurrentAccessPermission()
        Returns the access permissions granted when the document was decrypted. If the document was not decrypted this method returns the access permission for a document owner (ie can do everything). The returned object is in read only mode so that permissions cannot be changed. Methods providing access to content should rely on this object to verify if the current user is allowed to proceed.
        Returns:
        the access permissions for the current user on the document.
      • getSecurityHandler

        public SecurityHandler getSecurityHandler()
        Get the security handler that is used for document encryption.
        Returns:
        The handler used to encrypt/decrypt the document.
      • setSecurityHandler

        public boolean setSecurityHandler​(SecurityHandler secHandler)
        Sets security handler if none is set already.
        Parameters:
        secHandler - security handler to be assigned to document
        Returns:
        true if security handler was set, false otherwise (a security handler was already set)
      • isAllSecurityToBeRemoved

        public boolean isAllSecurityToBeRemoved()
        Indicates if all security is removed or not when writing the pdf.
        Returns:
        returns true if all security shall be removed otherwise false
      • setAllSecurityToBeRemoved

        public void setAllSecurityToBeRemoved​(boolean removeAllSecurity)
        Activates/Deactivates the removal of all security when writing the pdf.
        Parameters:
        removeAllSecurity - remove all security if set to true
      • getDocumentId

        public java.lang.Long getDocumentId()
      • setDocumentId

        public void setDocumentId​(java.lang.Long docId)