CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Main CFD Forum (https://www.cfd-online.com/Forums/main/)
-   -   extract bibtex data from pdf/ps files?? (https://www.cfd-online.com/Forums/main/12303-extract-bibtex-data-pdf-ps-files.html)

pdf-bibtex September 26, 2006 11:02

extract bibtex data from pdf/ps files??
 
Hi,

Very quick but important question. Does anybody know of a unix/linux tool that can extract the meta data from pdf or ps files and write it out in a bibtex format. I have 100's nearly thousands of pdf files and i would like to archive them and have the bibtex info.

Any help would be appreciated, I'm looking for either tools that can do the job in one, or a work around, as long as I don't have to type it all out....

Cheers,

Jim_Park September 26, 2006 11:57

Re: extract bibtex data from pdf/ps files??
 
I don't understand all of your needs, but you can extract a lot of the pdf 'meat' using

www.nuance.com/BusinessPDF.

At least that's what their ads say! : o )

pdf-bibtex September 26, 2006 12:02

Re: extract bibtex data from pdf/ps files??
 
Because I'm a poor student, I was kind of after something that I didn't have to pay for. what I'm really after is a method that can take as its input a pdf/ps file and as the output write out the author, year, journal, date, keywords, title etc of the paper into a bibtex format, which I can then copy and past into my ref.bib file.

See easy, that's all I want ;-))

Actually, really useful, not sure if you agree. Any more thoughts, ever heard of this???

Cheers,

Mani September 26, 2006 12:56

Re: extract bibtex data from pdf/ps files??
 
I am almost certain that this is a near impossible task. What you are asking for is a program that can

a) extract text from a pdf (so far so good), or from a postscript (very difficult, to say the least)

b) identify the bibliography section (not hard to do)

c) read each bibliography entry and disect it into "author, year, journal..." not knowing the order of items and particular format which may be different in each paper due to different styles (a formidable task!)

d) write a bibtex file (easy, once you have finished the impossible :)

It's not a trivial problem, but maybe not as difficult as CFD. It certainly would be useful. If you find such a program or end up writing one yourself, please share!

Harish September 26, 2006 21:51

Re: extract bibtex data from pdf/ps files??
 
The way i do it is using the electronic god called google.Go to scholar.google.com and and make sure to turn on show link to import citations to bibtex in scholar preferences and type the paper name in search bar.A sign will come below each paper ( import into bibtex ) and if you click on it will take you to a page with citation.Copy paste into bibtex.


diaw (Des_Aubery) September 26, 2006 22:48

Re: extract bibtex data from pdf/ps files??
 
Now, *that* is cunning... :)

diaw...

pdf-bibtex September 27, 2006 03:14

Re: extract bibtex data from pdf/ps files??
 
Many, many thanks for all your responses. But!!!!!

I'm not asking for what Mani suggests, I asking for a piece of code that Harish suggests. All I want is the information extracted about the paper I supply as the input, not the referenced papers within the input paper. For example were I to supply Barth's classic linear reconstruction paper from 1989, as the input in pdf format, it would return in bibtex format.

@Article{barth1989, author = {T.J. Barth, D.C. Jesperson}, title = {The design and application of upwind schemes on unstructured meshes}, journal = {AIAA}, year = 1989, volume = 89, number = 0366 }

I don't want the references within Barths paper, just the bibtex info from it. Please not that this has been thought of in the form on libextractor, can be got from sourceforge, but it will not install without massive system updates to linux, which I'm not going to do as I'm writing up!!

So many thanks again, and any further thoughts now I've refined the question would be helpful...

Cheers,

Mani September 27, 2006 06:50

Re: extract bibtex data from pdf/ps files??
 
I see, you have made it a bit easier, but not by much. The difficulty now is to identify title, author, journal, year, and so on from the front page. Title and author should be easy, but the rest is actually not always printed, and if it is, it may be a footnote, a headnote, or at some random place. I am curious if the google method works, though. Have you tried?

To be honest with you, now that you have explained exactly what you want, I am not so sure about the necessity any more. Obviously, it would save you some effort of typing. However, you surely would only reference papers that you have actually read (right?), and how much time does it take to make a bibtex entry for a paper, compared to the time you need to read even the abstract and conclusions? Don't be so lazy!! (just kidding,... let me know if it works)

Cut cell September 27, 2006 06:54

Re: extract bibtex data from pdf/ps files??
 
On the case now with google method, it's only failed to get one paper out of about 100 so far, and I'd never use that one anyway. Totally amazing method - the oracle rules again!

Cheers,

Mani September 27, 2006 07:10

Re: extract bibtex data from pdf/ps files??
 
Scary!

AUG January 19, 2010 19:10

how did you do it? :)
is it possible for windows?

f-w January 20, 2010 11:26

Cool Google trick. I use Mendeley free desktop client (www.mendeley.com) to manage my PDF collection (it's like Picassa for PDFs). You can right click and export Bibtex in it ...


All times are GMT -4. The time now is 22:21.