Digitizing Business Cards

Digitizing Business Cards

As digital as our world has become, there is still a tremendous amount of paperwork in it. Schools and workplaces are attempting to conserve resources by going paperless. Luckily, we can use computer science to help with the transition.

One of the challenges and benefits of the paperless office is converting information usually kept on paper into an organized electronic format. By doing so, two goals are met:

  1. Use of paper is reduced. This is both good for the environment and allows for more efficient use of space.
  2. The information is now digitally represented. This means that it is no longer tied to a physical, static piece of paper but can be manipulated computationally. Now, instead of hunting for John Smith’s business card, you can search for it electronically. It also means it can be copied without the need for additional physical resources.

Many products exist on the market for converting business cards into electronic data. For instance, a Google search for the term business card scanner yields many hits that offer solutions. The process is similar to screen scraping and another real life example of Creating Structure From Unstructured Data Sets.

Creating Usable, Useful Electronic Data

Think about the process necessary to convert the paper card into electronic data:

  • A digital picture is taken capturing the visible information on the card.
    • At this point, the card is no longer needed. The image of the card has the same functionality as the card itself.
  • The image needs to be converted into text, and any images (such as photos or logos) need to be isolated and cropped.
    • The text data is now unstructured.
  • It has no organization! Before we store this data, we need to isolate the bits of the text corresponding to the attributes we care about (e.g., name, address, phone #).

The final point is the one on which we’ll concentrate. Consider these two business cards:

Business Card A Business Card B

In order to convert these cards to database contact records, we need to first figure out which attributes our contact records support. Obviously, names and telephone numbers are needed, but what about headshots or job titles?

Second, we need to extract the information from each card and fill the contact records appropriately. Fill the following “contact record tables” with the information from the cards above:

Attribute Business Card A Business Card B
First name
Last Name
Company
Job Title
Phone #
Fax #
Email Address
Website
Business Name
Street Address
Street Name
Unit/Apt. #
City
State
Postal Code
Country

Discuss the following questions as a class:

  • Which design/content attributes were necessary, and which were not?
  • What other content is sometimes found on business cards? Perform an Internet search to find more examples, if you like.
  • Were there any that require a specific format? If so, what should the format be?
  • Are there any that would benefit from having a default value?

All of these factors dictate how the record should be filled—but there is more to it. Discuss the following questions as a class:

  • How is information located on the card?
  • How might a program locate the right information to fill in the attributes for each contact record?
    • Humans have mental rules and intuition for figuring this out, but computers need algorithms to follow and programs to execute. An effective algorithm would be able to help us in this scenario only, but a highly effective algorithm/computer program would work with all possible designs of business cards.

Instructions

Your job is to develop algorithms for identifying five different attributes that are found on business cards. The following attributes are a few examples:

  • First name
  • Last name
  • Company
  • Job title
  • Phone #
  • Fax #
  • Email address
  • Website
  • Business name
  • Street address
  • Street name
  • Unit/Apt. #
  • City
  • State
  • Postal code
  • Country

Your rules should make it clear how to automate the collection of information, but they do not have to be written in “code.” Here is an example (you may not use this example as your own!):

To identify an email address: query all text on the business card for the following format:
    _______@_______.___
    • each underscore ("______") represents alphanumeric text
    • no spaces or line breaks are allowed for this query

This is only one strategy for identifying a specific string of text. Other strategies may be more useful for other items, so be creative and experiment. Then, test your rules by searching the Internet for alternate business card designs. Will your rules work for all the cards you find? Make adjustments as necessary. This will help you develop effective algorithms for finding useful and usable data.

Submission

Submit a text document with your five algorithms. Be prepared to share your algorithms with the class and demonstrate how they apply to multiple business card designs.