PHP

How to filter multiple RSS feeds with SimplePie

As for installing SimplePie, it was a relatively painless operation. I was able to install the RSS library using Softaculous. This install comes with a working demo which I decided to modify to fit my needs, instead of starting from scratch, as it seemed like multiple people were having issues installing SimplePie 1.3.

As stated in my post about my problems starting a link reclamation campaign, I decided to learn how to use SimplePie in order to have an extremely malleable presentation as well as an easy way to filter out useless mentions in the multiple RSS feeds. For example, there is no way I don’t want to be notified about a craigslist ad mentioning my client.

In fact, I was easily able to build a code filters to deal with dirty data. In addition, it will be quite easy to add to these filters as I acquire links on more and more domains.

Limiting Number of Items from Each RSS Feed

This is quite easy to implement. All it takes is a line of code after declaring your feeds.


// Combine your chosen RSS feeds into one feed
$feed->set_feed_url(array( /*insert your feeds here*/ );

//Use this to limit the number of RSS items coming from each feed
$feed->set_item_limit(/*put your number here*/);

You can read about more about implementing the snippet here.

Filtering Out Domains, Keywords & Dates

This is where things start to get tricky. I wanted a filter that would look at each item in the feed and decide whether or not it should be included in my list of URLs on the page. I began by declaring an array which included all potential URLs that I knew where useless for link reclamation.


//Set up filter
$filter = array("youtube", "craiglist", "kijiji", "forum", "flickr", "dailymotion", "facebook");

The next step was to make sure that every item would pass through this filter and get flagged for removal if they included one of those words in the URL.

<?php  

//Let's begin looping through each individual news item in the feed.
	foreach($feed->get_items() as $item): 

//My filter variable
                $filtration = 0;

/* When the site fails to not have a substring from our filter array, it changes our filtration variable and exits the verification loop */
                foreach($filter as $token){
			if(stristr($item->get_permalink(), $token) !== false){
			$filtration = 1;
			break;
		        }
                }
?>

You could also filter out items using their titles, dates and descriptions as well using the same type of verification loop by changing $item->get_permalink() to the respective value you are looking to check.

<?php			
//If the item passed the filtering system, it will get displayed on the page 
		if($filtration != 1){ ?>
	
			<div class="chunk">

<!-- If the item has a permalink back to the original post (which 99% of them do), link the item's title to it. -->
			
<h4><?php if ($item->get_permalink()) echo '<a href="' . $item->get_permalink() . '">'; echo $item->get_title(); if ($item->get_permalink()) echo '</a>'; ?></h4>

			</div>

<!-- Stop looping through each item once we've gone through all of them. -->
	
	<?php } ?>
	<?php endforeach; ?>

Removing Duplicate URLs From Multiple Feeds

While I haven’t had to clean up duplicate listings, it’s also simple to modify this loop to remove duplicate titles from the list. You can learn more about doing this in the answer to this Stack Overflow question.

You Might Also Like

4 Comments

  • Reply
    Lary
    June 3, 2014 at 3:23 pm

    This was a great solution to keyword filtering of simplepie. Thank you!

  • Reply
    Mario
    August 31, 2015 at 6:41 pm

    Hi,

    Thanks for the write up but I’m in need of some help. I’m trying to apply this solution to my simplepie code but it is not working. I want to filter URLs that include youtube and facebook. Also, I would like to remote duplicates by title OR link. My code is below and I don’t know where the $filter is placed.

    // Include SimplePie
    include_once(‘./php/autoloader.php’);
    include_once(‘./idn/idna_convert.class.php’);

    //grab the feed
    $feed = new SimplePie();

    $feed->set_feed_url(array(
    ‘myblog.com/feed1’,
    ‘friendblog.com/feed2’,
    ‘otherblog.com/feed3’,
    ));

    //enable caching
    $feed->enable_cache(true);

    //provide the caching folder
    $feed->set_cache_location(‘cache’);

    //set the amount of seconds you want to cache the feed
    $feed->set_cache_duration(1800);

    //init the process
    $feed->init();

    //let simplepie handle the content type (atom, RSS…)
    $feed->handle_content_type();

    // Loop through all of the items in the feed
    foreach ($feed->get_items() as $item) {

    // Calculate 12 hours ago
    $today = time() – (12*60*60);

    // Compare the timestamp of the feed item with 12 hours ago.
    if ($item->get_date(‘U’) > $today) {

    // If the item was posted within the last 12 hours, store the item in our array we set up.
    $new[] = $item;

    }
    }
    // Loop through all of the items in the new array and display whatever we want.
    foreach($new as $item) {
    ?>

    <a href="”>
    <?php
    echo '’;
    }
    ?>

  • Reply
    ML
    February 17, 2017 at 3:11 pm

    Hey Phil,
    Just found this page through ggle, while I was searching on how to filter out feed items which do not have a title, or the title tag is empty.

    Do you know a way to do this? Thanks

  • Leave a Reply