CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

extract bibtex data from pdf/ps files??

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 26, 2006, 12:02
Default extract bibtex data from pdf/ps files??
  #1
pdf-bibtex
Guest
 
Posts: n/a
Hi,

Very quick but important question. Does anybody know of a unix/linux tool that can extract the meta data from pdf or ps files and write it out in a bibtex format. I have 100's nearly thousands of pdf files and i would like to archive them and have the bibtex info.

Any help would be appreciated, I'm looking for either tools that can do the job in one, or a work around, as long as I don't have to type it all out....

Cheers,
  Reply With Quote

Old   September 26, 2006, 12:57
Default Re: extract bibtex data from pdf/ps files??
  #2
Jim_Park
Guest
 
Posts: n/a
I don't understand all of your needs, but you can extract a lot of the pdf 'meat' using

www.nuance.com/BusinessPDF.

At least that's what their ads say! : o )
  Reply With Quote

Old   September 26, 2006, 13:02
Default Re: extract bibtex data from pdf/ps files??
  #3
pdf-bibtex
Guest
 
Posts: n/a
Because I'm a poor student, I was kind of after something that I didn't have to pay for. what I'm really after is a method that can take as its input a pdf/ps file and as the output write out the author, year, journal, date, keywords, title etc of the paper into a bibtex format, which I can then copy and past into my ref.bib file.

See easy, that's all I want ;-))

Actually, really useful, not sure if you agree. Any more thoughts, ever heard of this???

Cheers,
  Reply With Quote

Old   September 26, 2006, 13:56
Default Re: extract bibtex data from pdf/ps files??
  #4
Mani
Guest
 
Posts: n/a
I am almost certain that this is a near impossible task. What you are asking for is a program that can

a) extract text from a pdf (so far so good), or from a postscript (very difficult, to say the least)

b) identify the bibliography section (not hard to do)

c) read each bibliography entry and disect it into "author, year, journal..." not knowing the order of items and particular format which may be different in each paper due to different styles (a formidable task!)

d) write a bibtex file (easy, once you have finished the impossible

It's not a trivial problem, but maybe not as difficult as CFD. It certainly would be useful. If you find such a program or end up writing one yourself, please share!
  Reply With Quote

Old   September 26, 2006, 22:51
Default Re: extract bibtex data from pdf/ps files??
  #5
Harish
Guest
 
Posts: n/a
The way i do it is using the electronic god called google.Go to scholar.google.com and and make sure to turn on show link to import citations to bibtex in scholar preferences and type the paper name in search bar.A sign will come below each paper ( import into bibtex ) and if you click on it will take you to a page with citation.Copy paste into bibtex.

  Reply With Quote

Old   September 26, 2006, 23:48
Default Re: extract bibtex data from pdf/ps files??
  #6
diaw (Des_Aubery)
Guest
 
Posts: n/a
Now, *that* is cunning...

diaw...
  Reply With Quote

Old   September 27, 2006, 04:14
Default Re: extract bibtex data from pdf/ps files??
  #7
pdf-bibtex
Guest
 
Posts: n/a
Many, many thanks for all your responses. But!!!!!

I'm not asking for what Mani suggests, I asking for a piece of code that Harish suggests. All I want is the information extracted about the paper I supply as the input, not the referenced papers within the input paper. For example were I to supply Barth's classic linear reconstruction paper from 1989, as the input in pdf format, it would return in bibtex format.

@Article{barth1989, author = {T.J. Barth, D.C. Jesperson}, title = {The design and application of upwind schemes on unstructured meshes}, journal = {AIAA}, year = 1989, volume = 89, number = 0366 }

I don't want the references within Barths paper, just the bibtex info from it. Please not that this has been thought of in the form on libextractor, can be got from sourceforge, but it will not install without massive system updates to linux, which I'm not going to do as I'm writing up!!

So many thanks again, and any further thoughts now I've refined the question would be helpful...

Cheers,
  Reply With Quote

Old   September 27, 2006, 07:50
Default Re: extract bibtex data from pdf/ps files??
  #8
Mani
Guest
 
Posts: n/a
I see, you have made it a bit easier, but not by much. The difficulty now is to identify title, author, journal, year, and so on from the front page. Title and author should be easy, but the rest is actually not always printed, and if it is, it may be a footnote, a headnote, or at some random place. I am curious if the google method works, though. Have you tried?

To be honest with you, now that you have explained exactly what you want, I am not so sure about the necessity any more. Obviously, it would save you some effort of typing. However, you surely would only reference papers that you have actually read (right?), and how much time does it take to make a bibtex entry for a paper, compared to the time you need to read even the abstract and conclusions? Don't be so lazy!! (just kidding,... let me know if it works)
  Reply With Quote

Old   September 27, 2006, 07:54
Default Re: extract bibtex data from pdf/ps files??
  #9
Cut cell
Guest
 
Posts: n/a
On the case now with google method, it's only failed to get one paper out of about 100 so far, and I'd never use that one anyway. Totally amazing method - the oracle rules again!

Cheers,
  Reply With Quote

Old   September 27, 2006, 08:10
Default Re: extract bibtex data from pdf/ps files??
  #10
Mani
Guest
 
Posts: n/a
Scary!
  Reply With Quote

Old   January 19, 2010, 20:10
Default
  #11
AUG
New Member
 
Join Date: Jan 2010
Posts: 1
Rep Power: 0
AUG is on a distinguished road
how did you do it?
is it possible for windows?
AUG is offline   Reply With Quote

Old   January 20, 2010, 12:26
Default
  #12
f-w
Senior Member
 
Join Date: Apr 2009
Posts: 153
Rep Power: 16
f-w is on a distinguished road
Cool Google trick. I use Mendeley free desktop client (www.mendeley.com) to manage my PDF collection (it's like Picassa for PDFs). You can right click and export Bibtex in it ...
f-w is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
critical error during installation of openfoam Fabio88 OpenFOAM Installation 21 June 2, 2010 04:01
Video from case and data files girino FLUENT 9 March 29, 2010 03:41
Results saving in CFD hawk Main CFD Forum 16 July 21, 2005 21:51
Are Case and Data files enough? Zhengcai Ye FLUENT 6 January 8, 2004 05:02
How to extract streamlines from huge files? Markus Weber Main CFD Forum 3 August 3, 2000 10:28


All times are GMT -4. The time now is 07:15.