Main Page | Class Hierarchy | Class List | Directories | File List | Class Members | File Members | Related Pages

ParserYahoo.C File Reference

#include "ParserYahoo.H"
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include "HTMLGrabber.H"
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

Defines

#define PARSERYAHOO_GRAB_RETRIES   3
 Number of times to retry in case of wget failure.
#define PARSERYAHOO_DEPTH_TO_PARSE   90
 Default Depth to Parse, if none is given as a parameter.
#define PARSERYAHOO_EVENTS_PER_PAGE   10
 Number of Events given on each main page.
#define PARSERYAHOO_SUBPAGE_PREFIX_LENGTH   27
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_EVENT_PREFIX_LENGTH   34
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_EVENT_BEGIN_LENGTH   63
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_CATEGORIES_AFTER_BEGIN_LENGTH   23
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_DESCRIPTION_BEGIN_LENGTH   23
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_CATEGORIES_BEGIN_LENGTH   20
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_MAX_CATEGORY_NAME_LENGTH   64
 Length of important substring; #define'd here for convenience.
#define PARSERYAHOO_MAX_SUBPAGE_URL_LEN   300
 Maximum number of characters (with a bunch of padding) in the URL of a subpage.
#define PARSERYAHOO_MAX_PAGE_URL_LEN   400
 Maximum number of characters (with a bunch of padding) in the URL of a main page.
#define PARSERYAHOO_STARTING_URL_LEN   340
 Length of the starting URL.

Functions

bool findSubPages (FILE *page)
 Finds sub-pages given Yahoo Local's main search results page.
bool findNextPage ()
 Finds the next main-page, based on the current page.
bool yahooIsEmptyFile (char *fileName)
 Determines if a given file is empty (most likely due to wget failure).
Event::EventCategory processCategoryString (char *category)
 Map a found Category string to our representation of Category.
Event parseEvent (char *url)
 Creates an Event from a string representation of a URL.
Event getEventFromFile (FILE *subPage)
 Creates an Event based on a given HTML file for a single Event on Yahoo Local.
bool findMetaLine (size_t size, char **str, FILE *subPage)
 Advances the current line of the given file to the metadata line, which has info we want. Upon return, str contains desired line.
bool findDetailsLine (size_t size, char **str, FILE *subPage)
 Advances the current line of the given file to the details line, which has info we want. Upon return, str contains desired line.
bool findCatDescLine (size_t size, char **str, FILE *subPage)
 Advances the current line of the given file to the line with the description and categories, which has info we want. Upon return, str contains desired line.
bool handleMetaLine (char *str, Event *newEvent)
 Fills in the Event with information from the meta line (from findMetaLine).
bool handleDetailsLine (char *str, Event *newEvent)
 Fills in the Event with information from the details line (from findDetailsLine).
bool handleCatDescLine (char *str, Event *newEvent)
 Fills in the Event with information from the categories/description line (from findCatDescLine).

Variables

char yahoo_Sub_Pages [PARSERYAHOO_EVENTS_PER_PAGE][PARSERYAHOO_MAX_SUBPAGE_URL_LEN]
 Sub-Pages found during a single parse of the main page.
char yahoo_Next_Page [PARSERYAHOO_MAX_PAGE_URL_LEN]
 The next main page to parse.
FILE * yahoo_Page
 Currently-processed webpage.
HTMLGrabber yahoo_myGrabber
 Grabber of ye olde HTML.
char * subpage_Prefix = "<div class=\"cont\"><a href=\""
 Prefix of line containing start of Event info on main page.
char * event_Title_Prefix = "<meta name=\"DESCRIPTION\" content=\""
 Prefix of line containing basic meta info on Event page.
char * event_Begin = "<div id=\"ylsband\"><div class=\"ylsdefbxc\"><div class=\"ylstopbx\">"
 Prefix of line containing details info on Event page.
char * categories_After_Begin = "<div class=\"ylsmrinfo\">"
 Prefix of line AFTER line containing cat / desc info on Event page.
char * description_Begin = "Event Information: </b>"
 Beginning of Description section on Categories/Description line on Event page.
char * categories_Begin = "Category Types: </b>"
 Beginning of Categories section on Categories/Description line on Event page.
char * yahoo_Starting_Url = "http://local.yahoo.com/results;_ylt=At1y0HMWlf845KuB5bQ750uHNcIF;_ylu=X3oDMTBxOHY3ZWlmBF9zAzk2NjEzNzY3BHNlYwNwYWdpbmF0aW9u?stx=Events+Performances&city=North+Providence&state=RI&dma=521&uzip=02911&radius=50&fmap=134217177&sortby=aevent&flnstr=&flsstr=&search=event&ed=zKqLVq131Dwihmm_.HGw5Vos.rEs2X72GC4EcJiVi9qt.id5&ppg_nm=2&pg_nm=1&xargs="
 Starting URL.
int events_Processed = 0
 Number of Events processed. Only used for output convenience.

Detailed Description

Author:
Andrew Simon
Date:
05-02-06

Function Documentation

bool findCatDescLine size_t  size,
char **  str,
FILE *  subPage
 

Advances the current line of the given file to the line with the description and categories, which has info we want. Upon return, str contains desired line.

Parameters:
size The buffer size of str
str char buffer. Will be assigned the desired line of HTML
subPage Page to be read / advanced
Returns:
true if line is found, false otherwise

bool findDetailsLine size_t  size,
char **  str,
FILE *  subPage
 

Advances the current line of the given file to the details line, which has info we want. Upon return, str contains desired line.

Parameters:
size The buffer size of str
str char buffer. Will be assigned the desired line of HTML
subPage Page to be read / advanced
Returns:
true if line is found, false otherwise

bool findMetaLine size_t  size,
char **  str,
FILE *  subPage
 

Advances the current line of the given file to the metadata line, which has info we want. Upon return, str contains desired line.

Parameters:
size The buffer size of str
str char buffer. Will be assigned the desired line of HTML
subPage Page to be read / advanced
Returns:
true if line is found, false otherwise

bool findNextPage  ) 
 

Finds the next main-page, based on the current page.

Sets yahoo_Next_Page to the result.

Returns:
true if everything went well, false otherwise.

bool findSubPages FILE *  page  ) 
 

Finds sub-pages given Yahoo Local's main search results page.

Parameters:
page An opened file descriptor for the desired page
Returns:
true if everything went well, false otherwise.

Event getEventFromFile FILE *  subPage  ) 
 

Creates an Event based on a given HTML file for a single Event on Yahoo Local.

Parameters:
subPage HTML of Yahoo Local Event page
Returns:
The found Event. Should have most all fields set, except Lat/Long, End time, and sometimes Description

bool handleCatDescLine char *  str,
Event newEvent
 

Fills in the Event with information from the categories/description line (from findCatDescLine).

Parameters:
str The line of HTML containing the details we're after. Should be set by findCatDescLine().
newEvent Event; we will set Category and possibly Description.
Returns:
true if everything went well, false otherwise.

bool handleDetailsLine char *  str,
Event newEvent
 

Fills in the Event with information from the details line (from findDetailsLine).

Parameters:
str The line of HTML containing the details we're after. Should be set by findDetailsLine().
newEvent Event; we will set Location Title, Zip, and Start Time (possibly with day)
Returns:
true if everything went well, false otherwise.

bool handleMetaLine char *  str,
Event newEvent
 

Fills in the Event with information from the meta line (from findMetaLine).

Parameters:
str The line of HTML containing the meta-data we're after. Should be set by findMetaLine().
newEvent Event; we will set City, Title, Street, State
Returns:
true if everything went well, false otherwise.

Event parseEvent char *  url  ) 
 

Creates an Event from a string representation of a URL.

Handles some clerical stuff about connecting, and handling difficulties, but basically just calls getEventFromFile.

Parameters:
url String representation of desired URL.
Returns:
Event found at given URL, or Empty event if sufficiently severe error(s) occurred.

Event::EventCategory processCategoryString char *  category  ) 
 

Map a found Category string to our representation of Category.

Sets category to OTHER_UNKNOWN and prints a warning if category is unrecognized.

Parameters:
category Category String.
Returns:
The appropriate string->category mapping

bool yahooIsEmptyFile char *  fileName  ) 
 

Determines if a given file is empty (most likely due to wget failure).

Parameters:
fileName to consider.
Returns:
true if file is empty, false otherwise.


Variable Documentation

char* yahoo_Starting_Url = "http://local.yahoo.com/results;_ylt=At1y0HMWlf845KuB5bQ750uHNcIF;_ylu=X3oDMTBxOHY3ZWlmBF9zAzk2NjEzNzY3BHNlYwNwYWdpbmF0aW9u?stx=Events+Performances&city=North+Providence&state=RI&dma=521&uzip=02911&radius=50&fmap=134217177&sortby=aevent&flnstr=&flsstr=&search=event&ed=zKqLVq131Dwihmm_.HGw5Vos.rEs2X72GC4EcJiVi9qt.id5&ppg_nm=2&pg_nm=1&xargs="
 

Starting URL.

If you want this to apply to, say, another city, or change the parameters of the search, mess with this. If you change this, you MAY need to also change the means of accessing the next main page, as well.


Generated on Wed May 17 22:28:20 2006 for GeoEvents by  doxygen 1.4.2