Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

PDF Invoice to XML conversion

Former Member
0 Kudos

Hi.

We are a manufacturing company and receive numerous Purchase Orders and Invoices via email as a PDF.

We spend a lot of time typing this information in, is there a software/service out there that will allow me to convert these PDFs into useable XML so I can import it? I've tried a bunch of "PDF to XML" converters but they don't do anything other than export the internal tags which does nothing. I need a XML document that is clean and useable.

Many thanks for your help if I could solve this one.

Jason.

1 REPLY 1

Hello Jason,

I assume you need some code to extract data from an ADS rendered PDF file, and that your ABAP system has access to the SAP Netweaver ADS using ABAP?

In that case you might use some of the code below.

IV_INPUT is importing parameter type XSTRING, assumed to be the rendered PDF file

EV_OUTPUT is exporting/returning type XSTRING, which is the data part of the PDF file in XML format.

You can parse EV_OUTPUT with the iXML library.

method extract.
  data:
         lv_str_base64 type string,
         lv_dest       type rfcdest,
         lr_fp         type ref to if_fp,
         lr_pdfobj     type ref to if_fp_pdf_object,
         lv_pdf_bin    type xstring,
         lr_fpex       type ref to cx_fp_runtime,
         lr_root       type ref to cx_root.

  if iv_input is initial.
    write 'error spot 1'.
*   Some error handling
  endif.

  move cl_fp=>get_ads_connection( ) to lv_dest.

* get FP reference
  lr_fp = cl_fp=>get_reference( ).

  try.

*     Create PDF Object
      lr_pdfobj = lr_fp->create_pdf_object( connection = lv_dest ).

*     Set document
      lr_pdfobj->set_document(
        pdfdata = iv_input ).

*     Tell PDF object to extract data
      call method lr_pdfobj->set_task_extractdata( ).

*     Execute, call ADS
      call method lr_pdfobj->execute( ).

*     Get data
      call method lr_pdfobj->get_data
        importing
          formdata = ev_output.


    catch cx_fp_runtime_internal
          cx_fp_runtime_system
          cx_fp_runtime_usage    into lr_fpex.

      ls_faultdata-fault_text = lr_fpex->errmsg.

      write: 'error spot 2', lr_fpex->errmsg.
*     Some error handling

    catch cx_root.
      write: 'error spot 3'
*     Some error handling
     
  endtry.

endmethod.