[PLUG] Re: Looking for a perl programmer

Shantanoo Mahajan shantanoo at gmail.com
Sat Nov 11 11:15:23 PST 2006


+++ Andreas Pigni [PLUG] [10-11-06 23:42 +0100]:
| 
| Dear All,
| ?
| I wondered if you could help me for the following task. First, I just
| mention that I am a student here in Law in Switzerland.
| ?
| I would like to have in a single document (preferably excel) a list of
| all the European patent attorneys with contact details. This information
| is available from the internet. All these attorneys can be found at the
| following link: http://www.european-patent-office.org/reps/search.html
| ?
| It seems that there are 8353 members. 
| ?
| Thus, it seems that perl could be useful to fetch all the entries from
| 0001 to 8353:
| ?
| http://www.european-patent-office.org/cgi-bin/cgiwrap/vi00n006/reps/deta
| il.pl.cgi?id=001 to
| http://www.european-patent-office.org/cgi-bin/cgiwrap/vi00n006/reps/deta
| il.pl.cgi?id=8353
| ?
| I wondered if someone could write the script for me, provide it to me
| and also provide an excel table with all the entries with all the
| contact details in there for each attorney (half a day of work?). Of
| course, I would pay the programmer for that. Please let me know how much
| that would cost me (student price if possible?). 
| ?
| Please let me know as soon as possible if someone (you, a friend?) could
| do this task for me.
| ?
| Thanks a lot for your help.
| ?
| Kind regards,
| ?
| Andreas
| ?
| ?

Any takers?
Hint:

- get all the files e.g.
  perl -e 'for($i=1;$i<=...;$i++) { $str="wget http://www.european-patent-office.org/cgi-bin/cgiwrap/vi00n006/reps/detail.pl.cgi?id=$i"; `$str`;}'
- extract the data
  cat <file> | perl -e 'for(<>) { if(/<BLOCKQUOTE>/.../<\/BLOCKQUOTE>/) { print }}'
  Do some more parsing and store in .csv format which can be read in
  excel.

Shantanoo

-- 
Sometimes you'll be on a winning streak and every-thing will click; take
maximum advantage. When the opposite is true, hold steady and wait it
out.



More information about the plug-mail mailing list