Not logged inGosu Forums
Forum back to libgosu.org Help Search Register Login
Up Topic Gosu / Gosu Exchange / [Off Topic] Ruby Regexp URL HREF
- - By kyonides Date 2010-08-21 02:23 Edited 2010-08-21 02:57
Lately I was trying to fetch a list of specific links from any website or forum so I can update a topic list almost automatically using only Ruby and some rubygems. Maybe this wasn't the best idea I ever had but this scriptlet let me get closer to what I actually wanted to achieve with it. (I forgot how I could paste it here using some bbcode or something the like...)

require 'open-uri'
require 'nokogiri'
results = []
file = File.open 'href.txt','w'
doc = Nokogiri::HTML(open('http://some.website.com/12345678-some-subforum/'))
doc.search('//*[@href]').each do |m|
  if m[:href].include?('12345678') and !m[:href].include?('#lastmsg') and
      !m[:href].include?('forumid=')
    results << "<a href=\""+ m[:href] + "\"></a>"
  end
end
file.puts results.uniq!
file.close

The results are several lines like this one...

<a href="/12345678/98766432-some-weird-topic/></a>

...but I need to get the actual link name, too, so it looks like this...

<a href="/12345678/98766432-some-weird-topic/>Some Weird Topic</a>

...but IDK how to get the "Some Weird Topic" string from the doc variable... I guess I should nest another each iterator but if IDK what value to pass in the search method I won't be able to get the string...

I wonder if any of you have some experience with this kind of issue...
Parent - - By banister Date 2010-08-21 04:25
ask on www.stackoverflow.com
Parent - By kyonides Date 2010-08-21 04:54
Well, I guess that's the only thing I can do for now... I solved the issue but the code doesn't look neat...
Parent - - By erisdiscord Date 2010-08-21 05:13
Weeelll, this isn't a general Ruby help forum, but I happen to know the answer since I have experience with Nokogiri. You want to call m.content to get the text content.

Except, actually, you could replace your results << … line with results << m.to_s and this will give you the full link HTML.
Parent - By kyonides Date 2010-08-21 20:01
I'd use the m.content option just in case I get other tags or nodes like <img> which might also include href's.

jlnr, what if I were trying to get the highscores for a Ruby game from x forum? No one ever asked me why I was doing this anyway...
Parent - By jlnr (dev) Date 2010-08-21 10:25
Have to second banisterfiend, for general Ruby questions, StackOverflow is a way better place. I'm happy when this forum is lively but its point is in its focus.
Up Topic Gosu / Gosu Exchange / [Off Topic] Ruby Regexp URL HREF

Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill