Posted by Mario (Site Owner) at 17/07/04 23:36:
Finally I have managed to mirror the contents of this forum in such a way that it gets indexed by google. This is a bit special in so far as this whole site is built with Flash which cannot be read by the google bot and also as the forum is read from a database there are no static pages that google could spider.
For those who are technically interested how this works, here is a short explanation: on the front page inside a <noscript> tag sits a link to a php script which lists the main forums and their descriptions together with a link for each. Theoretically I could have continued and have created a static page for each forum thread. But somehow I think this would have been a waste of server space. So instead I created a special dynamic 404 error page which creates on demand only the pages that the web spiders look for. And they are looking for links that are created on exactly those error pages. So the bot actually stays on the same page all the time, but it gets served a different content. Of course the bot doesn't know - it thinks that it jumps to a new page and indexes is accordingly.
I had to discover that google is very picky about 404 error headers that were still sent when the replacement content was created. So it is very important to overwrite the header with a status 200 OK. Also in the first tries I forgot to handle some types of links which lead to dead ends and more 404. I don't know if that kept google from indexing the forum or if it just took some time - anyway it looks like after I got rid of those "page not found" messages all the threads have been indexed pretty fast.