Looking on advice for best language to use reading PDF files

Discussion in 'Programming & Software Development' started by Sn@Ke, Sep 25, 2018.

  1. Sn@Ke

    Sn@Ke Member

    Joined:
    Oct 5, 2003
    Messages:
    257
    Location:
    Sunshine Coast
    Hi all

    I'm looking for advice on which language is going to be the best to use when it comes to being able to open a PDF file and read the content. Specifically what I'm trying to do is create a program that is able to open a PDF file and interact with the 'lines' - for example measure distances on a PDF drawing. eg: PDF X-Change style app.

    I'm a PHP / Web dev, so I have actually created something similar using the skills I have, which was HTML5 canvas, PHP, jquery, but essentially all I was doing was overlaying a transparent div and tracing the objects and getting my measurements that way, which works fine, but it's too slow... I want to be able to natively select a 'line' in a PDF and then work out details based on that.

    I know this is a very vague question, but I'm looking for people who have experience in other languages to chime in and I'll research those, I'm pretty good and picking up new languages and learning plus I have the time and resources... I just need some directions if possible. I was thinking Python would be the go, but wanting some advice.
     
  2. neRok

    neRok Member

    Joined:
    Aug 19, 2006
    Messages:
    2,680
    Location:
    Perth NOR
    [Not a pro...] I like python for some things, but doing GUI's and packaging as EXE for Windows don't look easy. I would look at .NET with those requirements.
     
  3. elh9

    elh9 Member

    Joined:
    Feb 28, 2016
    Messages:
    97
    Location:
    Perth NOR
    You could drop the pdf in a cad program, most will handle it, and measure from there.
    I think PDF Box by Apache would be a good option. forces you onto java though
     
  4. Quadbox

    Quadbox Member

    Joined:
    Jun 27, 2001
    Messages:
    6,033
    Location:
    Brisbane
    This is a pretty broad question, pdfs are not exactly a homogeneous format. I mean are you talking about pdfs containing vector graphics, containing lossy compressed raster graphics, or containing lossless compressed raster graphics? they can contain any of the above. Or are you talking about the text?

    If you're talking about ignoring what the actual contents of it are and just ray-tracing it, then the fact it's a pdf is pretty irrelevant, just treat each page as an image and ray-trace it

    EDIT - And tbh no matter what the answer is to any of the above, I cant see a compelling reason why you'd do it in any language over another, whatever you're familiar with. Python? sure. C? sure. C++? Sure. C#, Swift, Go, Haskell, Perl, no doubt doable. Hell you could do it in Matlab if that's what you know. I wouldnt do it in javascript, but no doubt that's possible too :p
     
    Last edited: Oct 6, 2018
  5. elh9

    elh9 Member

    Joined:
    Feb 28, 2016
    Messages:
    97
    Location:
    Perth NOR
    Seems like library availability might rule out a couple of languages you've listed there.
    PDFBox by apache and PDFLib are pretty big pdf libraries (java and .net respectively), seems like sticking to one of those two would be the easiest.
     
  6. Foliage

    Foliage Member

    Joined:
    Jan 22, 2002
    Messages:
    32,016
    Location:
    Sleepwithyourdadelaide
    If you convert it to an image then just pick python or whatever else has a good image processing library.
     
  7. elh9

    elh9 Member

    Joined:
    Feb 28, 2016
    Messages:
    97
    Location:
    Perth NOR
    Seems like you'd lose some relevant information by rasterizing though
     

Share This Page