Looking around
google's Webmaster Tools I found a well hidden link to google's help page about
site map formatting. Looking at the Sitemap protocol (just a simple xml file) I set off to create a method that would take my dynamic website and create a nice little map for it. After all if you can not find it in google, you really don't need it.
I am sure I can come up with a general method to do this, but currently it is specific to how I have created my website.
You have to start your xml docuemnt. I really like using rexml/document, you will also need zlib class
data=REXML::Document.new('<?xml version="1.0" encoding="UTF-8"?>')
base=data.add_element("urlset")
base.attributes["xmlns"]="http://www.google.com/schemas/sitemap/0.84"</pre>
<br>These values are found on the Sitemap protocol website<br><br>
Then I add in my static content values<br><pre>
el=base.add_element("url")
el.add_element("loc").text="http://www.stephenbeckeriv.com/"
#Time needs to be in w3c formatting, For the life of me I can not use Time's w3cdtf
el.add_element("lastmod").text=Time.now.strftime("%Y-%m-%d")
el.add_element("changefreq").text="weekly"
el.add_element("priority").text="0.5"
#turns out they do not like svn sites
#el=base.add_element("url")
#el.add_element("loc").text="http://www.svn.stephenbeckeriv.com/code"
#el.add_element("lastmod").text=Time.now.strftime("%Y-%m-%d")
#el.add_element("changefreq").text="weekly"
#el.add_element("priority").text="0.5"</pre>
<BR><BR>Then I add in the context part of my site<br>
<pre>
@context_list = Post.find(:all,:select=>"context")
arr=[]
@context_list.each{|x|arr.push(x.context)}
arr.uniq.each{|context|
el=base.add_element("url")
el.add_element("loc").text="http://www.stephenbeckeriv.com/
el.add_element("lastmod").text=Time.now.strftime("%Y-%m-%d")
el.add_element("changefreq").text="weekly"
el.add_element("priority").text="0.5"
} </pre>
<br> I know I can uniq my results with the sql, but I could not remember the code for this. <br><BR>
Then I add my dynamic content<br><pre>
@post=Post.find(:all)
@post.each{|post|
el=base.add_element("url")
#if you read my first post you know why this works
el.add_element("loc").text="http://www.stephenbeckeriv.com/
a=Time.parse("#{post.created}")
el.add_element("lastmod").text=a.strftime("%Y-%m-%d")
el.add_element("changefreq").text="never"
el.add_element("priority").text="0.8"
}</pre>
<br><BR>We Now have all the web pages for my site. Now we need to save this to a file in public. There are some limits for the file. You can not have over 500,000 urls, the file must be less then 10MB. If this is a problem, you have to use the site map index files. I do not cover this. <br><br><pre>
result=""
data.write(result)
dir=Dir.pwd
#local dir.does not have /public
dir<< "/public/sitemap.xml.gz" if !dir.include("/public")
dir<< "/sitemap.xml.gz" if dir.include("/public")
Zlib::GzipWriter.open(Dir.pwd+"/public/sitemap.xml.gz"){|file|
file.write result
}
And you are done! Make this a method on your admin console and submit the link to google. A few things I want to do with this is provide a link to the xml, and create a dynamic site map with ajax using the same ideas. If someone wanted to present a map to my site they could use the xml and format it how they like. I am thinking about creating a general map that looks at the views and routes.rb to try and create the proper links.
Update:
Turns out when i added my site to the webmaster tools in google i used http://stephenbeckeriv.com/ not http://www.stephenbeckeriv.com/ which it does not like my site map because I added the www. How fun and flexible.
I expect most of the people on the net have some type of ad blocker. I know I do. Which is good news because I should never click my own google ads. Enjoy the ads!
I use Google Analytics to do tracking on my website. I can see search terms people have used, how long they are on my site, the browser, OS, and other fun facts. I was trying to use what I found in Google's api code to see if they had an API to Analytics... but they dont. The next best thing is exporting in to csv,tsv or pdf. So I took the hard road and did an export in
pdf. That report is just for one week. I did the export range since I started tracking and used gmail to make the pdf in to html and now I have this
site. Intresting how much someone in Lithuania likes my site. They are direct hits too. Most likely a bot. I hope to use the CSV data to load it in to a database and make nicer tables. I will do this for search terms too..
While working on my newest project
http://iblog.deathbyescalator.com/blog_import I hit a bit of a problem while posting to blogger. I did complete the project. It now uploads to blogger and wordpress but the google calendar import does not work.
Back to the problem at hand.. I found a few sites that looked promising. So I installed the GData-0.0.4 gem with no luck.
So after looking at the gdata gem and the blogger api I build a quick script to post a blog entry to blogger.
Full script at
http://svn.stephenbeckeriv.com/code/gdata/blogger/
But the good parts: Auth
module Net
class HTTPS < HTTP
def initialize(address, port = nil)
super(address, port)
self.use_ssl = true
end
end
end
GOOGLE_LOGIN_URL = URI.parse('https://www.google.com/accounts/ClientLogin')
def get_gdata_headers(email,pass,service = 'blogger',source = 'gdata-ruby' ,url = 'www.blogger.com')
headers = {}
response = Net::HTTPS.post_form(GOOGLE_LOGIN_URL,
{'Email' => email,
'Passwd' => pass,
'source' => source,
'service' => service })
raise "Not Authorized" unless response.kind_of? Net::HTTPSuccess
headers['Content-Type'] = 'application/atom+xml'
headers['Authorization'] = "GoogleLogin auth=#{response.body.split(/=/).last}"
headers
end
I got the HTTPS class from the GData lib. I like the idea. This call does nothing but authenticate with the blogger service and produce the needed header hash for calls later. This same method can be used for all the google apis.
Posting to the right blog
def find_blog_by_url_from_xml(blog_url,xml)
xml_of_blogs = REXML::Document.new(xml)
blogs = xml_of_blogs.each_element("/*/entry"){|z| z}
urls = blogs.map{|e|
e.each_element("link"){|mr| mr}[0].attributes["href"]
}
index = urls.index(urls.select{|u| u.include?(blog_url)}.first)
blogs[index]
end
I have many blogs but want to post to just one of them. This method takes the body from a Http call i will show later. That call returns xml with all of the blog info. Links, author data, ids and other goodies. I do not think this is the best way to find the xml data I want. It is just one way that works.
Sending the post:
def post_xml_to_blog(base_url, headers, blog_url,post_array)
conn = Net::HTTP.new(base_url, 80)
list_of_blogs = conn.get("/feeds/default/blogs",headers)
blog_xml = find_blog_by_url_from_xml(blog_url,list_of_blogs.body)
blog_links = blog_xml.each_element("link"){|mr| mr}
post_url = blog_links.select{|e| e.attributes["href"] =~ BLOGGER_FEED_REGEX }.first.attributes["href"]
post_path = post_url.match(BLOGGER_FEED_REGEX).to_s
response = Array.new
post_array.each{|entry|
response << conn.post(post_path, build_atom_entry_xml(entry).to_s, headers)
}
response
end
Simple enough. I send in an array that has the hash of entries to create. I build these methods for my needs. These can be easily adapted and changed. Hopefully a fully supported GData lib will raise from some where and save me from my
one off scripts.