Since we discovered that an IP had been leeching from us, Blue discovered that softtester has about lots of IPs doing it.
Blue doesn't think theres anything we can do about it and he's not seen any protection for it.
I'd like to discuss and figure out if its causing us problems, I know its costs resources like bandwidth.
Also, if it is a problem you'd think it would have happened to other people in the past and there would be some sort of protection.
Comments?
Yeah I thought a basic session / ip check could be done. Guess we'd have to make sure we got all the bots and redirected them to an appropriate response code page, other that 404. That was if we screw
ReplyDeleteup we don't upset the spider.
The other question is do we need this protection, are we loosing out ?
The other question is do we need this protection, are we loosing out ?
ReplyDeleteI guess it depends personally i would not be happy if a lot of my hard work collecting customers improving the site, number of pad submitters was just leeched to build a rival website.
One thing to remember is that leechers are not normally that common, back in the days ebay had the same problem with leechers and screen scrapers I was one of them :) they rewrote the code and made it
less easy to leech.
I still a little unsure as to what is getting leeched. Is someone setting up a duplicate site by copying the HTML from softtester?
ReplyDeleteOn DP, I've seen people selling scripts for PAD sites whereby the script "automatically" updates by scouring the net for PADs. Is this an example of a "leecher" script?
PADs are public property and it's pretty easy to get your hands on 30 to 50,000 of them. What else can a leecher leech? They can't leech backlinks so a dupicate site still wouldn't perform as well as the original.
The immediate problem I can see is the bandwidth. What other symptoms are you guys seeing? Can you see a copy of the content on another site?
I guess that massive list of padfiles for shareware site owners (which I used initially too) are basically just links. The owner still has to read the file, which are on different sites and all this
ReplyDeletechecking takes time.
I guess a program / script which could (leech) go round getting the latest information would be extremely useful and save time.
I like the idea of rewriting "the code and made it
less easy to leech" what kind of things did they do?
From an SEO point of view the fact that the page has changed could look good to googlebot too.
I'd be happier with a less interactive approach, I mean what as I going to do when I get an email.
ReplyDeleteJust like to lock them out.
A solution to this problem is looking very messy or have numerous issues attached.
ReplyDeleteYou can't use PHP session variables as these are destroyed once the browser / program closes the page. You can't use cookies as the programs probably won't use them. And using a database will make
things slower and impact resources.
Doomed :(
An idea, what about if we use a database a use number e.g.
ReplyDeleteiIP1 as int e.g. 200
iIP2 as int e.g. 1
iIP3 as int e.g. 78
iIP4 as int e.g. 56
iDay as int e.g. 365
Then we can do a select whatever where iIP1 = CurrentIPFirstPart etc
Have an index on all 5 fields
Thoughts?
Thinking about it if somebody wanted to leech your site, no amount of protection that you add will stop those that are serious at getting the data.
ReplyDeleteNothing these days are impossible to work around.
Storing in Database seems like a good idea however I have a feeling a site your size with traffic as it is would have a big impact on the system as your site will be issuing a insert / select command
for every page viewed constantly.
I agree with Blue. To defeat this kind of protection, all
ReplyDeletethat has to happen is the IP address changes.
PAD files are public property anyway so someone can
recreate a shareware site by a number of means.
Leeching is a bit lazy but you can buy 50,000 PADs quite
easily on the web.
My concern would be the traffic/ bandwidth consumed by
leeching. Can your friendly host simply block access
from the leech's IP?
Yep we can block IP Addresses at the moment it is a manual process.
ReplyDeleteHmmm, I wonder...
ReplyDeleteCan I automate producing this log?
Could I then automate blocking IP addresses?
Would this be possible, using a scheduled PHP script?