For now, I recommend three WordPress plugins – Anti-spam by CleanTalk, WP Scraper, and Bulk Delete. They all have free trial versions with highly affordable upgrade prices for added features for extended usage. Look these plugin names up in the “Add Plugin” search engine search field, as they are hard to download and upload separately from their websites, are best installed with good documentation/features/installation instructions/FAQ’s/screenshots on WordPress “More Information” links, activated, and then you can experiment with manual gate/block/delete/deport utilities for backlinks and other ad/HTML data that serves you best, proactively keep out that which doesn’t, and delete temporarily and in some cases permanently that which doesn’t serve your needs, until the process is automated. And it’s not all about money and sufficient-payment confirmation, an unlimited amount of criteria can be served.
Anti-spam by CleanTalk will keep out anything with black-listed words, admit that which is white-listed, and keep out IP addresses, including their octet variations, and domains/sub-domains of what are universally black-listed as “spammers”, with the traditional assumption robots are bad, humans are good, but it can be very much be the other way around, and both will become more indistinguishable over time. Only confirmation of payment, before and after, for you and others you can identify, with backlinks you and authorized personnel add to static/non-modifiable-by-outside-visitors preserved, even if and when you don’t care about affiliate commissions, can keep robot/human seller activity honest who want to avoid the world black-list. And the robot seller will in most all cases want to be an affiliate/reseller also, for other criteria CleanTalk doesn’t really clean but is really a good firewall/blocker with the understanding you and the community will analyze a library of good and bad contributors, with automatic blocking of bad IP/octet variations, domains/sub-domains, already in world black-list, and/or you add what you want to the world black-list. As robots and people become more difficult to distinguish, with nefarious intentions by the worst of the worst, it is imperative we create a proactive blocking mechanism that can become air-tight.
WP Scraper will scrape websites like the dashboard on your PayPal account* (sign-up and/or login with pre-defined data, sometimes artificially-generated usernames/passwords in the case of black hat scrapers), including data on your website, or here at Panda Busters, like sellers scraping reseller data with their full-cooperation. You can grab/scrape from PayPal payer’s name, domain, PayPal e-mail address, amount paid to you, and the date/time stamp, place in a spreadsheet like Excel/Google Docs (but better yet automated with WP Scraper for better content management system database storage and manipulation of data pre-defined ongoing calculations, with 24/7 cron jobs when automated). The stipulated protocol is for the sellers of the Panda Busters network to pay by no later than the 15th of the following month, 12 midnight GMT, for all earnings verified by the affiliate management system we offer when you sign-up as a seller (free for 14 days, no credit card obligation, or sign-up for a new account and easily repopulate your data as a seller with hard-drive-stored values if you don’t attract at least one affiliate, short learning curve). Pay once to each affiliate, easy to do in mass with one key stroke, for all pre-month transactions, and as resellers/affiliates, on the day after required payments, you can easily white-list all those who pay something, and by default black-list those that don’t, unless they meet other criteria, like your own or other rights-assigned contributors (eg. authors, editors, etc.) or outside visitors that you have hopes for, for any of the nine criteria below you weight in importance. Scrapebox.com will accomplish the same tasks, only it is a desktop application, not an automated WordPress plugin.
Bulk Delete will allow you to trash (temporarily dispose) anything you don’t want, such as black-listed words, domains, etc. (under custom fields), and prepare for permanent deletion to save hard drive space, with scheduled time intervals. It will delete tags (categories) and many other items as well. Username/password assigned user-information, for administrators, editors, authors, subscribers, contributors, etc., all possible roles, are usually considered “friendlier” than visitors, and there is less often a need to delete their information. You can target page/post “types”, as visitor-effected-dynamic pages (forums, topics, blogs, etc.) are more likely to have unfavorable visitor activity historically, but you can retain paying sellers and reject those who don’t pay. But of course there will exist conditions where you want to retain ads/backlinks/HTML from outsiders that didn’t pay, based on many factors like black-list/white-list considerations, experiment to get your criteria perfect. It is best to delete any URL string that is not in the sellers’ list of friendly robots, once robots everywhere find out “black hat doesn’t pay”. You can scrape, or copy/paste URL’s from seller search when you sign-up as a reseller, which can be done with Excel to exclude the scraped/copy-pasted URL’s from your blogs, forums, etc., or much more easily than that, simply white-list the friendly URL’s with Anti-spam.
In a nutshell, the nine criteria for now I believe are the best for proactive and retroactive filtration are as follows:
- IP address/octet variations – banned if already in universal black-list database, now mostly sanctioned by cooperation of webmasters based on presumed unwanted/non-beneficial spamming/advertising, especially when in higher volume with only slight fourth-octet variations.
- Domain-sub-domain variations – backlinks of unwanted spammers. Banned if already in universal black-list database, now mostly sanctioned by cooperation of webmasters based on presumed unwanted/non-beneficial spamming/advertising, especially when in higher volume with only slight sub-domain variations.
- Black-listed words – words, word combinations, or character string combinations (looking at all HTML in dynamic posts/pages), banned proactively, with any possible update in the future
- White-listed words – words, word combinations, or character string combinations (looking at all HTML in dynamic posts/pages), allowed to enter proactively, with any possible updates in the future
- Complaint score – can be based on selected historic time intervals/ranges. Percentage of complaints from resellers relative to the highest score (based on percentage of pool of sellers, number of votes is 50% important, average rating is 50% important)
- Recommendation history – can be based on selected historic time intervals/ranges. Percentage of recommendations from resellers relative to the highest score (based on percentage of pool of sellers, number of votes is 50% important, average rating is 50% important)
- pre-Woodrank – score of 0 to 10 for most likely high Google ranking for anchor text/keywords in backlink based on projected best score independent of backlink importance. This means best keyword frequency, prominence, proximity, in view-able text, meta-tags, comments, size of document, and domain-URL-string, for both press release page/post and it’s corresponding landing page, best keyword relevance between two documents but with low plagiarism penalty for best anticipated “optimum overlap” percentage. Plagiarism penalty must be low, as pointing to “original content” is paramount for outbound links, for you as reseller and the resellers below you pointing to the press release. Also important and a factor in the score, anchor text “upward trendiness” value, or slope from linear regression calculations, averaged out over total future time interval, equal to longest possible history of search numbers accumulated, with bias toward short-term versus long-term slope-calculations. The weighted-averages, different levels of importance of each of these sub-criteria, will change over-time based on Web position/Pagerank-interpolation-and-deformulation analysis, understanding Pagerank. The current number of competitors for keywords based on search for keywords in quotes, is better when lower and will also be considered
- Woodrank – same as pre-Woodrank, only with backlink number/number of backlink tiers/percentage of overall Pagerank score value is accessed. This is useful when reseller wants sellers who have a history of backlinks and is already more successful, so potential for seller to backlink to reseller from many tiers below reseller means the more established highly-backlinked seller-reseller relationship can excel more than virgin prospectors
- Proven payment history – seller is expected to have proven track-record for paying other resellers, for now the average pay-out to past resellers, which the reseller can stipulate for selected past time intervals/ranges (will be the result, with future update, of resellers allowing database of PayPal/e-mail payment processor information auto-scraped by PandaBusters.com username/password login access, with cron job that will probably access each account once per month
* Until WP Scraper and it’s competition (the desktop software Scrapebox.com) improve on and add better unblocking capabilities, Scrapebox training for regex pattern-matching and feature/function/etc. identification, username/password customization, human speed emulation/reCaptcha breaking, incorporate with proxy scraping, etc., I advise people to simply download your transactions report from PayPal for “Payments Received”, for the range of the 1st to the 15th of the current month. This means a CVS file with seller’s info like domain, PayPal address, etc., obtained on the 16th of every month, “white-list” the domains/strings on Bulk Delete, and perform necessary page/link adding/subtracting with Bulk Delete and CleanTalk monthly.