Estimating an article's average reading time (Python)

Offering an article's reading time estimation to your site's content, can contribute greatly to your end users.  First of all, it allows end users to prepare the time they needs to read an article in full. Secondly, it could help them choose the right article for the right amount of available time they have. Lastly, it opens a whole new range of features, sorting options and filter improvements you can offer (like filtering articles by reading time).

In this post, I will walk you through on how to estimate the reading time of any public article url by crawling and making simple calculations (written in Python). By the way, this post was estimated to be 6 minutes.

 

Estimating words per minute

Words per minute, commonly abbreviated WPM, is a measure of words processed in a minute, often used as a measurement of the speed of typing or reading. WPM has many meanings and complications. The first is, that average reading time is subjective. Secondly, the length or duration of words is clearly variable, as some words can be read very quickly (like 'dog') while others take much longer (like 'rhinoceros'). Therefore, the definition of each word is often standardized to be five characters long. There are other parameters that effect the reading time such as font type and size, your age, rather you're reading on a monitor or paper, and even the number of paragraphs, images and buttons in the article's site. 

Based on research done in this field, people are able to read English at 200 WPM on paper, and 180 WPM on a monitor (the current record is 290 WPM).  

For the sake of simplicity, we'll define a word as five characters (including spaces and punctuation), and WPM = 200. Feel free to add additional parameters to your calculation. Note that if all you're looking for is a broad estimation, what we've defined will suffice.

 

From URL to Estimating reading time

Lets design the simple algorithm process:

  1. Extract visible webpage text (title, subtitle, body, page buttons, etc.) from given url.
  2. Filter unnecessary content from text.
  3. Estimate filtered text reading time.

1. Extracting visible webpage text

In order to extract a webpage's text content, we'll use Python libraries called BeatifulSoup and Urllib:

import bs4
import urllib, re

def extract_text(url):
    html = urllib.urlopen(url).read()
    soup = bs4.BeautifulSoup(html, 'html.parser')
    texts = soup.findAll(text=True)
    return texts

2. Filter unnecessary page content

Once we've extracted the desired text content, we need to filter out all the unnecessary content such styles (CSS), scripts (JS), html headers, comments, etc:

def is_visible(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        return False
    elif isinstance(element, bs4.element.Comment):
        return False
    elif element.string == "\n":
        return False
    return True

def filter_visible_text(page_texts):
    return filter(is_visible, page_texts)

3. Estimate reading time

To estimate the article's reading time, we need to count number of words (as defined above) and divide by defined WPM (200):

WPM = 200
WORD_LENGTH = 5

def count_words_in_text(text_list, word_length):
    total_words = 0
    for current_text in text_list:
        total_words += len(current_text)/word_length
    return total_words

def estimate_reading_time(url):
    texts = extract_url(url)
    filtered_text = filter_visible_text(texts)
    total_words = count_words_in_text(filtered_text, WORD_LENGTH)
    return total_words/WPM
    

That's it! Feel free to test it out with any string url, by calling the method estimate_reading_time.

To view the source code, please visit my GitHub page. If you have any questions, feel free to drop me a line.

How to improve your chatbot in 3 simple steps

I have tested hundreds of chatbots and came to realize a key factor to why they fail in user experience. You may be thinking it has to do with the the chatbots purpose or the lack of powerful AI, but thats not usually the case. Actually, many of the chatbots have a very good purpose, and do solve a real problem or pain. 

The main reason these chatbots fail is because of a key element developers miss - Users are expecting to chat with your chatbot, not fill a form.  I will use a Weather Bot called Weabo as an example throughout this post. Lets take a look at the following conversation:

Screen Shot 2017-04-14 at 17.38.30.png

The above conversation usually occurs when your conversation logic looks something like this:

function generate_response(message, current_state) {
  ...
  if (current_state == "zipcode.get") {
    response = "This is not a valid Zipcode. Please try again";
    if (is_valid_zipcode(message)) {
      save_user_zipcode(message);
      response = get_weather(message);
    }
  }
  ...
  return response
}

As much as we would hope so, users don't always follow the conversation flows we as bot developers have designed and expected. For most users this might even be the first time talking to a chatbot. They don’t understand the complexity of NLP/NLU and how fragile a bots understanding can be. So you can’t blame them for doing what they’re supposed to do - simple chat. Thats why you shouldn’t assume specific and strict user input because in many cases the user will be stuck in a infinite loop, lose patience and dump your chatbot. 

Here are 3 steps you can take to significantly improve your conversational experience without much work:

1. Provide small talk context understanding

In my opinion, every chatbot should standardize itself to understand and respond to basic small talk. You don't need to be an NLP or Machine expert to supply small talk to you chatbot. Just get familiarized with one of the many great 3rd party solutions out there such as Api.ai, Wit.ai, Lex, etc… These solutions offer simple out of the box solutions for small talk. So for example if the user asked “What can you do”, you can easily catch that via 3rd party APIs and provide the appropriate response. Check out Api.ai’s solution for small talk which I personally recommend and have found very useful - CLICK HERE.

To summarize thus far, supply small talk understanding for anything from a basic “hello” or “thank you”, to specific questions such as “what can you do?” In my opinion, this shouldn’t take you more than a days work. More strongly, once you’re obligated to provide answers to questions such as “what can you do?”, this will push you to really tighten your bots purpose and understand whats unique about it.

2. Keep your conversation flow logic loose

Conversation states are crucial to any chatbot so if you’re going in that direction - good job. However,  don’t build your flow logic such that you’re expecting a specific and strict answer from users, because thats where you can seriously fail. Instead, loosen your logic and accept the fact that users might decide to deviate from the flow you’ve built.

All you have to do, is reverse your current logic. If you’re expecting a Zip code as input at some conversation state, match the current state with the appropriate response only if you’ve first identified that there’s a valid Zip code in the users input. Otherwise, treat the current user input as a stateless message, ignore the users current state and respond accordingly. Also, take into consideration the retrieved intent from the 3rd party you've decided to integrate. Lets look at a refined example of the flow chart above:

function generate_response(message, current_state, intent) {
  ...
  if (intent == "smalltalk.name.get") {
    response = "My name is Weabo :)";
  } else if (intent == "smalltalk.help") {
    response = "Sure! You can type 'What is the weather in X?'"
  } else if (intent == "weather.get") {
    response = get_weather(message)
  } else if (is_valid_zipcode(message)) {
    if (current_state == "zipcode.get") {
      save_user_zipcode(message);
      response = get_weather(message);
    }
  }
  ...
  return response
}

After implementing this logic, the conversation example above would look like this:

Screen Shot 2017-04-14 at 17.37.50.png

To summarize, try to first understand the intent/meaning of every incoming message, and only in the right cases, match them with users current states and respond accordingly.

3. Redirect unknown intents to what you do know

Lastly, I want to talk focus on unknown intents. Unknown intents are messages that the chatbot did not understand or knows how to respond to. I have identified that between 70%-80% of all users input falls into unknown intents. There are hundreds of blog posts I can write regarding how to improve your bots logic in this case, but for now I’ll focus on one - redirect the user to what your bot can understand. Think of the conversation as a pinball game. The user shoots a ball and your mission is to make sure the ball doesn’t enter the drains. In order to achieve that, is to provide hints and responses of what your bot can understand. For example: “I might understand this in the future. For now, I can tell you the weather. Try this: whats the weather in new york”.

Most users just want to understand your chatbots limits, what it can do, but mostly what it can’t do. The more you assist your users in understanding what your bot can’t do, the less users input will fall into the unknown intents category.

Summary

There is still so much you can do to improve your chatbots conversational experience. Whats important to understand, is the simplicity of significantly improving your chatbots user experience. More importantly, users want to chat, but mostly understand your chatbots abilities and how to improve their relationship with it. Treat the conversational interface as you would treat any basic human to human conversation, and forget what you’ve learned about web/app interfaces.

Last note: If you’re not planning to provide basic free text understanding, consider moving to persistent menus/quick replies. Its better to limit the users expectations at first, rather than to disappoint them.

I hope this post helped you in some way and thank you for taking the time to read it. Feel free to drop a comment if you have any thoughts!

 

URL text summarizer using Web Crawling and NLP (Python)

To skip this tutorial, feel free to download the source code from my Github repo here.

I’ve been asked by a few friends to develop a feature for a WhatsApp chatbot of mine, that summarizes articles based on URL inputs. So when a friend sends an article to a WhatsApp group, the bot will reply with a summary of the given URL article. I like this feature because from my personal research, 65% of group users don’t even click the shared URLs, but 97% of them will read a few lines of the articles summary.

As part of being a Fullstack developer, it is important to know how to choose the right stack for each product you develop, depending on the requirements and limitations. For web crawling, I love using Python. The Python community is filled with efficient, easy to implement open source libraries both for web crawling and text summarization. Once you’re done with this tutorial, you won’t believe how simple it is to implement the task.

 

GETTING STARTED

For this tutorial, we’ll be using two Python libraries:

  1. Web crawling - Beautiful Soup. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
  2. Text summarization - NLTK (Natural Language Toolkit). NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Go ahead and get familiar with the libraries before continuing, and also make sure to install them locally. If you’re having trouble installing the libraries, follow this commands in your Terminal:

pip install beautifulsoup4
pip install -U nltk
pip install -U numpy
pip install -U setuptools
pip install -U sumy

After that, open Python command line and enter:

import nltk
nltk.download(“stopwords”) 

 

THE ALGORITHM

Lets describe the algorithm:

  1. Get URL from user input
  2. Web crawl to extract the natural language from the URL html (by paragraphs <p>).
  3. Execute the summarize class algorithm (implemented using NLTK) on the extracted sentences.
    1. The algorithm ranks sentences according to the frequency of the words they contain, and the top sentences are selected for the final summary.
  4. Return the highest ranked sentences (I prefer 5) as a final summary.

For section 2 (1 is self explanatory), we’ll develop a method called getTextFromURL as shown below:

def getTextFromURL(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    return text

The method initiates a get request to the given URL, and returns the extracted natural language from the URL html page.

For sections 3-4, we’ll develop a method called summarizeURL as shown below:

def summarizeURL(url, total_pars):
    url_text = getTextFromURL(url).replace(u"Â", u"").replace(u"â", u"")
    fs = FrequencySummarizer()
    final_summary = fs.summarize(url_text.replace("\n"," "), total_pars)
    return " ".join(final_summary)

The method calls the method above to retrieve the text, and clean it from html characters and trailing new lines (\n). Secondly, execute the Summarize algorithm (inspired by this post) on the given text, which then returns a list with the highest ranked sentences which is our final summary.

 

SUMMARY

That’s it! Try it out with any URL and you’ll get a pretty decent summary. The algorithm proposed in this article as as stated, inspired by this post, which implements a simple text summarizer using the NLTK library. There are many summarization algorithms which have been proposed in recent years, and there’s no doubt there are even better solutions. If you have any suggestions, recommendations I’de love to hear about them so comment below!

Feel free to download directly the source code via my Github account.

How to develop a Facebook Messenger bot (using Node.js and MongoDB)

To skip the tutorial, feel free to download the source code from my Github repo here.

In very little time, there are quite a few very good tutorials out there for beginners about how to develop a Facebook Messenger bot. However, these tutorials describe a stateless bot. What that means is, that for every user who send a message to your bot, there is no info saved regarding his current state in the conversations, and other basic info. This is why I've decided to write this tutorial, which consists of a basic implementation of a Facebook Messenger bot, in addition to a functional working MongoDB library. 

Saving end users specific states dynamically within a conversation, is crucial for the UX of any basic chat bot. Saving states allows the bot to communicate with the end users in a flow which follows some pattern, that otherwise would not be possible. 

Getting started

For starters, you'll need a Facebook developers account which can be found here.

Secondly, follow the beginning of the process for creating a Facebook page and set up a 'Webhook' (up until step 5) by clicking here. Note: You should write down the verification code which you've provided to the web hook in the tutorial. Secondly, once you've got a Facebook page up and running, look up the page token, and send a POST request with the following:

https://graph.facebook.com/v2.6/me/subscribed_apps?access_token=<TOKEN_GOES_HERE>

You should get a response 'true' which means you've synced your Facebook page with the provided API.

Lastly, please get familiar with the basics of Node.js and MongoDB. It is recommended to learn the basics of MongoDB. In addition, you should understand the basics of writing in Node.js and ES6.

Now let's create you very first Facebook messenger chat bot!

Facebook API Structured messages

First things first. Understand and learn the basic concepts of the Facebook API - click here. Let's look at an example:

"welcome_message": {
  "attachment": {
    "type": "template",
      "payload": {
        "template_type": "button",
          "text": "Hello and welcome to your first bot. Would you like              to get see our products?",
            "buttons": [
              {
                "type": "postback",
                "title": "Yes",
                "payload": "get_options"
              },
              {
                "type": "postback",
                "title": "No",
                "payload": "no_options"
              }
            ]
      }
  }
}

In the example above, you can see that for every message sent to Facebook, we need to declare the type of message, which is in this case a template (for basic text, text is enough). In addition, we declare the template type which is buttons in this case, and the buttons themselves. For every button, we need to declare the button title, type and payload. Type is so we'll know how the button click is handled, and payload is so we can identify which button the user clicked (a further example is described in the source code). 

Server side

The basic and required implementation for the server side, is to set up a GET handler for the url/webhook/, and a POST for the same url/webhook/. The GET handler is for Facebook verification when applying your url webhook and should be as follows:

function facebookVerification(req, res) {
    if (req.query['hub.verify_token'] === WEBHOOK_TOKEN) {
        res.status(200).send(req.query['hub.challenge']);
    } else {
        console.error("Failed validation. Make sure the validation          tokens match.");
    }
}

Note: the WEBHOOK_TOKEN above is to be stated as you've declared when initializing the webhook. Facebook shows an example with 'my_voice_is_my_password_verify_me'. You can leave it as is and update the source code.

The second and most important method is the POST. Facebook Messenger sends every ICM (Incoming message) sent to your bot page, via POST to the url you've declared in the developers portal. The method should handle all ICM either those which arrived by user clicks, or by free text. I will describe three methods which are used in this case:

// 0 MongoDB info
const mongoose = require('mongoose');
const User = mongoose.model('User', {_id: String, name: String, profile_image_url: String, phone_number: String, current_state: String});
// 1
app.post('/webhook/', facebook_parser.facebookWebhookListener);
// 2
function facebookWebhookListener(req, res) {
    if (!req.body || !req.body.entry[0] ||                      !req.body.entry[0].messaging) {
        return console.error("no entries on received body");
    }
    let messaging_events = req.body.entry[0].messaging;
    for (let messagingItem of messaging_events) {
        let user_id = messagingItem.sender.id;
        db_utils.getUserById(user_id, messagingItem, parseIncomingMSGSession);
    }
    res.sendStatus(200);
}
// 3
function getUserById(user_id, incomingMessage, callback) {
    var result = null;
    //Lets try to Find a user
    User.findById(user_id, function (err, userObj) {
        if (err) {
            console.log(err);
        } else if (userObj) {
            result = userObj;
            console.log('User ' + user_id + ' exists. Getting current user object:', userObj);
        } else {
            console.log('User not found!');
        }
        // After getting user object, forward to callback method.
        callback(user_id, incomingMessage, userObj);
    });
}
// 4
function parseIncomingMSGSession(user_id, messageItem, userObj) {
    var current_state = "welcome_message";
    if (userObj != null) {
        current_state = userObj.current_state;
    }
    // If we recieve any text message, parse and respond accordingly
    if (messageItem.message && messageItem.message.text) {
        // Currently support a static welcome message only
        sendFacebookGenericMsg(user_id, message_templates.templates["welcome_message"]);
    }
    // If the user sends us a button click
    if (messageItem.postback && messageItem.postback.payload) {
        var button_payload_state = messageItem.postback.payload;
        switch (button_payload_state) {
            case "get_options":
                sendFacebookGenericMsg(user_id, message_templates.templates["options_message"]);
                break;
            case "no_options":
                sendFacbookTextMsg(user_id, "Ok. There is so much you can do with stateful bots!");
                break;
        }
    }
    // Save new user state. If user does not exist in DB, will create a new user.
    db_utils.setUserFieldById(user_id, "current_state", "");
}

The first step (commented) is for listening to POST requests and forwarding them to a method called facebookWebhookListener (method 2). This method then retrieves from the POST body the relevant info such as the message item (consists of user unique id, message text, etc) and forwards the content to a method called getUserById (method 3).

The method getUserById (method 3), uses the info set at the top (comment 0), and tries to retrieve a user with the given id in the DB. If the user is not found, a null will be returned, and the info is passed to a callback function which is in our case, parseIncomingMSGSession (method 4). 

The method parseIncomingMSGSession (method 4), is in charge of sending an OGM (Outgoing message) based on the user info. In the case above, the default state is "welcome_message". Secondly, the method obtains the type of the ICM, which could either be a text message, or a clicked message (when user clicks on buttons the bot provided). Based on the ICM and users state, a relevant message is sent. There are additional methods declared in the code above, which I will not explain, since they are pretty much self explanatory and can be found in full in the source code provided at the top of this post (or at the end). Feel free to ask me any questions regarding any of the methods and general flow of the server side.

Finally, in order to send back a response to the end user, you'll need to send a POST request with the message template as described above and with the following structure:

// Send generic template msg (could be options, images, etc.)
function sendFacebookGenericMsg(user_id, message_template) {
    request({
        url: 'https://graph.facebook.com/v2.6/me/messages',
        qs: {access_token: TOKEN},
        method: 'POST',
        json: {
            recipient: { id: user_id },
            message: message_template
        }
    }, facebookCallbackResponse);
}

function facebookCallbackResponse(error, response, body) {
    if (error) {
        console.log('Error sending messages: ', error)
    } else if (response.body.error) {
        console.log('Error: ', response.body.error)
    }
}

The TOKEN shown above is the page token you've received via the Facebook developers portal page. Congratulations! You've completed your very first Facebook messenger bot. The source code is built in such a way, that it'll be very easy for you to scale it up to a fully functional chatbot.

To view the full project source code, click the button below. Feel free to ask any questions you might have, and I'll answer you ASAP! 

Chatbots - The beginners guide

If you search for chatbots on Google, you'll probably come across hundreds of pages starting from what is a chatbot to how to build one. This is because we're in 2016, the year of the chatbots revolution.

I've been introduced to many people who are new to this space, and who are very interested and motivated in entering it, rather they're software developers, entrepreneurs, or just tech hobbyists. Entering this space for the first time, has become overwhelming in just a few months, particularly after Facebook announced the release of the messenger API at F8 developer conference. Due to this matter, I've decided to simplify the basic steps of entering this fascinating world.

What is a chatbot?

To fully understand what is a chatbot and its potential, lets start by watching the following example:

Get the idea? The conversation example above, was conducted between an end user and a chatbot, built on the Facebook messenger platform.

So what is a chatbot? It is a piece of software that is designed to automate a specific task. More specifically, a chatbot is essentially a conversational user interface which can be plugged into a number of data sources via APIs so it can deliver information or services on demand, such as weather forecasts or breaking news. 

Why now?

Chatbots have been around for decades right? So what is all this noise all of a sudden? This question has many different answers, depending on who you ask. If you ask me, there are two main reasons:

1. Messaging has become the most essential and most popular tool for communication. 

2. We're closer to AI (Artificial intelligence) and NLP (Natural Language Processing) breakthroughs than ever before. This means that talking to a chatbot can closely become as real as talking to a human. Today, developers can find many APIs that offer AI/NLP services, without even understanding how AI/NLP works - This is HUGE. A few examples I recommend are Crunchable.io, Chatbots.io, Luis.ai (a must!), API.ai and Wit.ai.

Basically, the point I'm trying to make is, that messaging platforms are the place we all go to on a regular basis. So why not bring all the other places into this platforms? This is what Facebook did with Facebook Messenger.

Facebook Messenger is far more than a messenger app. It is a store for thousands of apps which are integrated into our daily conversations. Furthermore, as stated above, Facebook has released its chatbot platform in April, 2016. Since then, more than 11,000 bots have been added to Messenger by developers.

Where are the chatbots?

The first chatbot I built was on WhatsApp. The reason I chose WhatsApp, is because all my friends use it as their main messaging platform. Unfortunately, WhatsApp doesn't offer an official API. What this means is, that WhatsApp doesn't approve building chatbots on its platform (not a surprise since WhatsApp is a Facebook company, which itself offers an extensive API). This doesn't mean that there aren't any work arounds. If you're as stubborn as I am, take a look at yowsup and start from there. You'll also need a registered phone number before starting the process. So to conclude, WhatsApp is probably not the place you'll find rich and powerful chatbots. 

Platforms that do offer official APIs are:

1. Facebook Messenger

2. Slack

3. Telegram

4. Kik

There are other deployment channels such as Android and iOS (via SMS), Skype and even Email. However, the listed above are the ones I would focus on.

You can find a rich list of most of the chatbots out there by clicking here, thanks to our friends at Botlist.co that did an amazing job. 

How do I build a chatbot?

This requires a long answer. An answer I will save for my next blog post, in which I will describe how to build your very first chatbot using Node.js and MongoDB.

If you're not a developer, or is looking for an easier approach which does not require programming, here are a few solutions for you:

1. Chatfuel - My first choice. No coding required. Easily add and edit content— what you see is what you get.

2. Botsify - Build a facebook messenger Chatbot without any coding knowledge. 

3. Meya.ai - Meya helps with the mechanics of bot building so you can concentrate on the fun stuff.

There is some downsides to using a service instead of building your own. Using the above services limit your creativity in many ways, enabling you only a glimpse of what can be done. Secondly, you are using a third party hosting service, which means you're stuck with them. Nevertheless, these are great solutions for services that will get you started with chatbots, without the need for any coding knowledge.

Summary

There has been a lot of controversy rather bots will succeed or fail in the near future. To understand the controversy, you have to understand the differentiation between "stupid" bots and "smart" bots. "Stupid" bots work with structured input, while "smart" bots process your natural language and provide a more human-to-human experience.

The main issue with "stupid" bots is that as soon as people start bundling things up, changing their minds, going back to what has been mentioned earlier in the chat, etc., the bot falls apart. Therefore, as long as chatbots can't fully conduct a conversation naturally, while understanding the intent of the user at every stage, bots will be limited and ineffective. 

Having said that, in my opinion, chatbots don't have to be smart in order to succeed. There are thousands of use cases in which a "stupid" chatbot can simplify both the end users experience, and the business side productiveness. Take for example ordering Pizza. You can create a flow in which the user needs to enter inputs based on questions and options. You can deliberately state the input you're expecting from the user, and therefore the need for NLP or AI becomes irrelevant. I would prefer ordering pizza from a "stupid" bot then over the phone, or some cheap website any day. 

To fully summarize the above and much more, have a look at the Chatbot ecosystem, brought together by Business Insider.

Stay tuned for my next blog post, about how to develop your very first Facebook Messenger chatbot, using Node.js and MongoDB.

 

Recursively find all duplicate files in a directory (Java)

Intro

Sure this could be implemented using a few lines of basic Linux commands. However, understanding how to write such code in Java, requires understanding in several topics - Hash tables, recursion, lists, file system, and more.

That is why I love this problem. Not only does it concern understanding much of Java's fundamentals, but also there is great deal of required efficiency regarding time complexity and space complexity.

Duplicate detection using hash

The first problem to take into consideration is, how do I detect duplicated files? Should I only consider file names? What about file size? Maybe both? Considering both is still not enough. It's pretty easy to come up with a counter example for this approach. Take for example File A thats called fileA.txt and file B called fileB.txt. fileA.txt contains the word "hello" however fileB.txt contains the word "world". Both files contain the same name and size, however are not identical. That is why my approach will contain reading the files bytes, and saving a unique hash id for each file.

    private static MessageDigest messageDigest;
    static {
        try {
            messageDigest = MessageDigest.getInstance("SHA-512");
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("cannot initialize SHA-512 hash function", e);
        }
    }

In the above code, we apply a notable secure hash function called SHA-512. We will use this function to create a unique id for each of the files in the file system.

Duplicated files retrieval using Hash Table

Our second problem, is how to store the files id hash efficiently for future retrieval in an efficient way. One of the best methods for retrieval efficiently is of course Hash Tables which if implemented properly, enable retrieval in O(1) complexity time. What we'll do is store the hash unique id's as keys, and for every key, the value will be a Linked List containing all of the duplicated String paths associated to the same key. Such hash id's are very very big which is why we'll also use the Java library BigInteger.

And finally, we'll traverse all sub directories and files recursively, such that for each directory, traverse all of it's files. The final implementation is as follows:

public static void findDuplicatedFiles(Map<String, List<String>> lists, File directory) {
        for (File child : directory.listFiles()) {
            if (child.isDirectory()) {
                findDuplicatedFiles(lists, child);
            } else {
                try {
                    FileInputStream fileInput = new FileInputStream(child);
                    byte fileData[] = new byte[(int) child.length()];
                    fileInput.read(data);
                    fileInput.close();
                    String uniqueFileHash = new BigInteger(1, md.digest(fileData)).toString(16);
                    List<String> list = lists.get(uniqueFileHash);
                    if (list == null) {
                        list = new LinkedList<String>();
                        lists.put(uniqueFileHash, list);
                    }
                    list.add(child.getAbsolutePath());
                } catch (IOException e) {
                    throw new RuntimeException("cannot read file " + child.getAbsolutePath(), e);
                }
            }
        }
    }

All thats left is to run the above method and print out the Hash tables key values if such exists (that is that the associated linked lists hold duplicates.

Map<String, List<String>> lists = new HashMap<String, List<String>>();
        FindDuplicates.findDuplicateFiles(lists, dir);
        for (List<String> list : lists.values()) {
            if (list.size() > 1) {
                System.out.println("\n");
                for (String file : list) {
                    System.out.println(file);
                }
            }
        }
        System.out.println("\n");

The source code can be found in the download link below:

Feel free to ask any related questions in the comments below.

How to create a dynamic HTML Email Template

HTML automated emails  have come a long way in the past couple of years. What used to be a text-only email, today contains various forms, dynamic links, images, depending each company on there personal style displays.

Today, receiving HTML emails is a standard to most leading companies, which is why adapting this principle over regular text only emails has become a must. 

Developing HTML templates doesn't require a lot of coding skills, however knowing how to code the template to appear correctly on all devices and old email clients is the real challenge.

In this blog post I will go through a step by step guide of how to build a dynamic email template via HTML and PHP.

Basic guidelines

As I've described above, the biggest challenge with developing an HTML email template, is making sure it's cross-platform-compatible. There are so many email clients such as Google Mail, Apple Mail, Outlook, AOL, Thunderbird, Yahoo!, Hotmail, Lotus Notes and etc. Some of these clients and others are light years behind the eight-ball in terms of CSS support, which means we must resort to using HTML tables to control the design layout if we really want our email template to display consistently for every user. In fact, using HTML tables is the only way to achieve a layout that will render consistently across different mail clients. Think of the template as being constructed of tables within tables within tables...

Secondly, we must use inline CSS to control elements within your email, such as background colors and fonts. CSS style declarations should be very basic, without use of any CSS files.

To emphasize the HTML tables rule above, see the example below, where I've modified the border attribute of each table to be visible. Please note that the %s is a placeholder where dynamic text and images will be filled as I'll see soon describe (Scroll to the end to see the final email template):

 
Screen Shot 2016-04-29 at 14.12.47.png
 

As you can see above, the whole layout is built by HTML tables. We'll be using PHP libraries to parse the %s place holder and fill it with dynamic text before an email is sent to the user.

Developing the static template

So let's start programming! Before we begin the template itself, you'll need to begin your HTML file with an XHTML document:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>Demystifying Email Design</title>
  <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
</html>

I recommend defining all tables with border="1" as seen above since it's easier to spot errors and see the skeleton of the layout as you go along. At first, let's create the basic layout:

<body style="margin: 0; padding: 0;">
 <table border="1" cellpadding="0" cellspacing="0" width="100%">
  <tr>
   <td>
    My first email template!
   </td>
  </tr>
 </table>
</body>

Set your cellpadding and cellspacing to zero to avoid any unexpected space in the table. Also set the width to 100% since this table acts as a true body tag for our email, because styling of the body tag isn't fully supported.

Now we'll add instead of the text 'My first email template!' another table which will present the actual email template display.

<table align="center" border="1" cellpadding="0" cellspacing="0" width="600" style="border-collapse: collapse;">
 <tr>
  <td>
   This is the email template body
  </td>
 </tr>
</table>

As you can see, the width is set to 600 pixels. 600 pixels is a safe maximum width for your emails to display correctly on most email clients. In addition, set the border-collapse property to collapse in order to make sure there are no unwanted spaces between the tables and borders.

In the example above, you can see that our email template consists of five sections (rows) which is why we'll create these rows and then add tables accordingly to each in order to complete the template.

<table align="center" border="1" cellpadding="0" cellspacing="0" width="600">
 <tr>
  <td>
   Row 1
  </td>
 </tr>
 <tr>
  <td>
   Row 2
  </td>
 </tr>
 <tr>
  <td>
   Row 3
  </td>
 </tr>
   <tr>
  <td>
   Row 4
  </td>
 </tr>
   <tr>
  <td>
   Row 5
  </td>
 </tr>
</table>

At each row, we'll create a new table in which the mythology is similar to the above. We'll also add columns accordingly and the right paddings to align all objects to reach the desired template.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>Automatic Email</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
<body style="margin:0; padding:10px 0 0 0;" bgcolor="#F8F8F8">
<table align="center" border="1" cellpadding="0" cellspacing="0" width="95%%">
    <tr>
        <td align="center">
            <table align="center" border="1" cellpadding="0" cellspacing="0" width="600"
                   style="border-collapse: separate; border-spacing: 2px 5px; box-shadow: 1px 0 1px 1px #B8B8B8;"
                   bgcolor="#FFFFFF">
                <tr>
                    <td align="center" style="padding: 5px 5px 5px 5px;">
                        <a href="http://url-goes-here" target="_blank">
                            <img src="http://logo.png" alt="Logo" style="width:186px;border:0;"/>
                        </a>
                    </td>
                </tr>
                <tr>
                    <td align="center">
                        <!-- Initial relevant banner image goes here under src-->
                        <img src="%s" alt="Image Banner" style="display: block;border:0;" height="100%%" width="600"/>
                    </td>
                </tr>
                <tr>
                    <td bgcolor="#ffffff" style="padding: 40px 30px 40px 30px;">
                        <table border="1" cellpadding="0" cellspacing="0" width="100%%">
                            <tr>
                                <td style="padding: 10px 0 10px 0; font-family: Avenir, sans-serif; font-size: 16px;">
                                    <!-- Initial text goes here-->
                                    %s
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
                <tr>
                    <td bgcolor="#E8E8E8">
                        <table border="1" cellpadding="0" cellspacing="0" width="100%%" style="padding: 20px 10px 10px 10px;">
                            <tr>
                                <td width="260" valign="top" style="padding: 0 0 15px 0;">
                                    <table border="1" cellpadding="0" cellspacing="0" width="100%%">
                                        <tr>
                                            <td align="center">
                                                <a href="tel:phone number goes here" target="_blank">
                                                    <img src="url for call image goes here.png" alt="Call us"
                                                         style="display: block;"/>
                                                </a>
                                            </td>
                                        </tr>
                                        <tr>
                                            <td align="center"
                                                style="font-family: Avenir, sans-serif; color:#707070;font-size: 13px;padding: 10px 0 0 0;">
                                                GIVE US A CALL
                                            </td>
                                        </tr>
                                    </table>
                                </td>
                                <td style="font-size: 0; line-height: 0;" width="20">
                                    &nbsp;
                                </td>
                                <td width="260" valign="top">
                                    <table border="1" cellpadding="0" cellspacing="0" width="100%%" >
                                        <tr>
                                            <td align="center">
                                                <a href="mailto:emailgoeshere@gmail.com">
                                                    <img src="url for email image goes here" alt="Email us"
                                                         style="display: block;"/>
                                                </a>
                                            </td>
                                        </tr>
                                        <tr>
                                            <td align="center"
                                                style="font-family: Avenir, sans-serif; color:#707070;font-size: 13px;padding: 10px 0 0 0;">
                                                EMAIL US
                                            </td>
                                        </tr>
                                    </table>
                                </td>
                                <td style="font-size: 0; line-height: 0;" width="20">
                                    &nbsp;
                                </td>
                                <td width="260" valign="top">
                                    <table border="1" cellpadding="0" cellspacing="0" width="100%%">
                                        <tr>
                                            <td align="center">
                                                <a href="url to faq page goes here" target="_blank">
                                                    <img src="url for faq image goes here" alt="FAQ Page"
                                                         style="display: block;"/>
                                                </a>
                                            </td>
                                        </tr>
                                        <tr>
                                            <td align="center"
                                                style="font-family: Avenir, sans-serif; color:#707070;font-size: 13px;padding: 10px 0 0 0;">
                                                BROWSE FAQ PAGE
                                            </td>
                                        </tr>
                                    </table>
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
                <tr>
                    <td bgcolor="#66989c" style="padding: 15px 15px 15px 15px;">
                        <table border="1" cellpadding="0" cellspacing="0" width="100%%">
                            <tr>
                                <td align="center">
                                    <table border="1" cellpadding="0" cellspacing="0">
                                        <tr>
                                            <td>
                                                <a href="facebook url goes here" target="_blank">
                                                    <img src="facebook image goes here" alt="Facebook" width="50" height="50"
                                                         style="display: block;" border="1"/>
                                                </a>
                                            </td>
                                            <td style="font-size: 0; line-height: 0;" width="20">&nbsp;</td>
                                            <td>
                                                <a href="youtube page link goes here" target="_blank">
                                                    <img src="youtube image link goes here" alt="Youtube" width="50" height="50"
                                                         style="display: block;" border="1"/>
                                                </a>
                                            </td>
                                            <td style="font-size: 0; line-height: 0;" width="20">&nbsp;</td>
                                            <td>
                                                <a href="twitter page goes here" target="_blank">
                                                    <img src="twitter image goes here" alt="Twitter" width="50" height="50"
                                                         style="display: block;" border="1"/>
                                                </a>
                                            </td>
                                            <td style="font-size: 0; line-height: 0;" width="20">&nbsp;</td>
                                            <td>
                                                <a href="linkedin page goes here" target="_blank">
                                                    <img src="linkedin image goes here" alt="Linkedin" width="50" height="50"
                                                         style="display: block;" border="1"/>
                                                </a>
                                            </td>
                                            <td style="font-size: 0; line-height: 0;" width="20">&nbsp;</td>
                                            <td>
                                                <a href="home page goes here" target="_blank">
                                                    <img src="homepage image goes here" alt="GreenIQ" width="50" height="50"
                                                         style="display: block;" border="1"/>
                                                </a>
                                            </td>
                                        </tr>
                                    </table>
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>
</body>
</html>

A few observations: 

  1. Add alt attributes where needed in order to present text instead of images incase the email client was unable to load them properly. 
  2. Add %s place holders where you'd like the data to appear dynamically depending on the email use case.
  3. If you look carefully, the percentage values appear with an extra '%'. This is so the PHP library used to making this dynamic, knows how to parse the text properly.

Note! I've removed the URLs for security and privacy issues. You can replace them with your own images and personal links.

And that is it! You've successfully developed your own email static template. Now let's get our hands dirty and make it dynamic!

Developing the dynamic template

Save the above code as a template.html file. Now let's create a new PHP file.

On the server side, create the email send method below:

function send_mail_template($to, $from, $subject, $message)
{
  $headers = "MIME-Version: 1.0" . "\r\n";
  $headers .= "Content-type:text/html;charset=UTF-8" . "\r\n";
  $headers .= "From: ContactNameGoesHere <" . $from . ">\r\n";
  $response = mail($to, $subject, $message, $headers);
}

Now if you look carefully back to the HTML email template, you'll see that I've added %s place holders in certain places. More particularly, in the image banner element, and body text. Now all we need to do is import the above HTML template, parse it like regular text, add the relevant text in place of the '%s' and use the above send_mail_template method.

function build_email_template($email_subject_image, $message)
{
    // Get email template as string
    $email_template_string = file_get_contents('template.html', true);
    // Fill email template with message and relevant banner image
    $email_template = sprintf($email_template_string,'URL_to_Banner_Images/banner_' . $email_subject_image. '.png', $message, $mobile_plugin_string);
    return $email_template;
}

After we've got that taken care of, we can use both methods and send are very first dynamic email! Let's use an example. Say a new user has just verified his email. We'd like to automate that use case on the server side and send the user a 'Your email has been successfully verified' email. 

Assume we have the users verified email 'user@user.com' and the company's email is 'company@company.com'. We can now send an automated email:

$from = "company@company.com";
$to = "user@user.com";
$body_text = "Your email has been successfully verified...";
$banner_image_subject = "account_verified";
$final_message = build_email_template($banner_image_subject, $body_text);
send_email($to, $from, "You email has been verified", $final_message);

Finally! You can now use this methodology any way needed. After sending this example, while applying the GreenIQ's company images and text, this is the final email template sent to the user:

Feel free to ask any question below!

How to send push notifications with PHP

Sending push notifications to an iOS/Android Application can enhance the experience exponentially for users, also it allows you to deliver key data easily. However, actually sending the push notification to users can be a bit tedious at times, and at times confusing. You need to ensure that you pack your integers, and times correctly - failing to do this and you'll probably get an unhelpful status from Apple or Google.

I've came across some online PHP Scripts for either iOS or Android implementation however not for both. This PHP script includes implementation for both mobile operation systems.

PHP Script (For a description, scroll below the script):

function send_mobile_notification_request($user_mobile_info, $payload_info)
{
    //Default result
    $result = -1;
    //Change depending on where to send notifications
    $pem_preference = "production";
    $user_device_type = $user_mobile_info['user_device_type'];
    $user_device_key = $user_mobile_info['user_mobile_token'];
    if ($user_device_type == "iOS") {
        $apns_url = NULL;
        $apns_cert = NULL;
        //Apple server listening port
        $apns_port = 2195;
        if ($pem_preference == "production") {
            $apns_url = 'gateway.push.apple.com';
            $apns_cert = __DIR__.'/cert-prod.pem';
        }
        //develop .pem
        else {
            $apns_url = 'gateway.sandbox.push.apple.com';
            $apns_cert = __DIR__.'/cert-dev.pem';
        }
        $stream_context = stream_context_create();
        stream_context_set_option($stream_context, 'ssl', 'local_cert', $apns_cert);
        $apns = stream_socket_client('ssl://' . $apns_url . ':' . $apns_port, $error, $error_string, 2, STREAM_CLIENT_CONNECT,                                   $stream_context);
        $apns_message = chr(0) . chr(0) . chr(32) . pack('H*', str_replace(' ', '', $user_device_key)) . chr(0) . chr(strlen($payload_info)) .                               $payload_info;
        if ($apns) {
            $result = fwrite($apns, $apns_message);
        }
        @socket_close($apns);
        @fclose($apns);
    }
    else if ($user_device_type == "Android") {
        // API access key from Google API's Console
        define('API_ACCESS_KEY', ADD_YOUR_API_KEY_HERE);
        // prep the bundle
        $msg = array
        (
            'message' => json_decode($payload_info)->aps->alert,
            'title' => 'This is a title. title',
            'subtitle' => 'This is a subtitle. subtitle',
            'tickerText' => 'Ticker text here...Ticker text here...',
            'vibrate' => 1,
            'sound' => 1,
            'largeIcon' => 'large_icon',
            'smallIcon' => 'small_icon'
        );
        $fields = array
        (
            'registration_ids' => array($user_device_key),
            'data' => $msg
        );
        $headers = array
        (
            'Authorization: key=' . API_ACCESS_KEY,
            'Content-Type: application/json'
        );
        $ch = curl_init();
        curl_setopt( $ch,CURLOPT_URL,                     'https://android.googleapis.com/gcm/send' );
        curl_setopt( $ch,CURLOPT_POST, true );
        curl_setopt( $ch,CURLOPT_HTTPHEADER, $headers );
        curl_setopt( $ch,CURLOPT_RETURNTRANSFER, false );
        curl_setopt( $ch,CURLOPT_SSL_VERIFYPEER, false );
        curl_setopt( $ch,CURLOPT_POSTFIELDS, json_encode( $fields ) );
        $result = curl_exec($ch);
        curl_close($ch);
    }
    return $result > 0;
}

function create_payload_json($message) {
    //Badge icon to show at users ios app icon after receiving notification
    $badge = "0";
    $sound = 'default';
    $payload = array();
    $payload['aps'] = array('alert' => $message, 'badge' => intval($badge),'sound' => $sound);
    return json_encode($payload);
}

Description

Let's start. The first method builds the body of the notification request depending on the users operation system and sends accordingly. The flow process for sending push notifications is first to Apple/Google servers, and only then to the end user. Therefore, each end user holds on his mobile device, a unique token. How to retrieve the user device key in Android or iOS.

Personally, I wrote the main method such that the input contains $user_mobile_info - An array containing the user's device and unique device key, and $payload_info - A JSON which contains the body message for sending the push notification request (Found in the second method). The $pem_preference variable inside the method is also hard coded, however can be changed to your preference. Apple offers two servers for development - sandbox for QA (gateway.sandbox.push.apple.com) and regular for production (gateway.push.apple.com). If you're in the testing phase of your development, just change the url or the variable itself.

The second method builds the message body. I've hard coded some variables such as the sound and badge. Sound can be changed to various options, and badge describes the badge to be shown when the user receives the notification. I've modified it to "0", meaning there will be no badge icon when receiving notifications.

Usage Example

The main part of the push notification is the message itself. Let's say the notification we want to send is "I know how to send push notifications!". We'll first create the payload JSON using the second method:

$payload = create_payload_json("I know how to send push notifications!");

Let's say the user has an iOS (This info can be kept on a server, database etc... for each user) and the array is as follows:

$user_mobile_info = ['user_device_type'=>"iOS", 'user_mobile_token'=>'1234ABCD'];

Now we can send the notification itself using the first method: 

send_mobile_notification_request($user_mobile_info, $payload);

They're many minor sections which have not been covered by this blog post. Feel free to leave me comments if you have any further questions.