#!/usr/bin/perl
#
$bapropos = q`
mira(1r) - text-only, command-line oriented web browser
`;
$esc = "\033";
$g_bold = $esc . "[1m";
$g_normal = $esc . "[0m"; $zn = "\005";
$b = $g_bold; $n = $g_normal;
$cmdhelp = "
MIRA (Munafo's Internet Research Assistant) is a simple web browser
created to satisfy the following objectives (listed by priority):
- When you quit or restart the browser after a crash, it retains the
history ('back' command's URL list) you had before the quit/crash
- Keep a permanent record of where you have been and the text you have
read while searching the Internet
- Allow viewing of old web pages (and old versions of current web
pages) even after the original server no longer has them
- Allow the user to easily search through all this old saved text for
one or more words or a phrase, etc.
- Retain simplicity by remaining entirely text-only (like $b"."lynx$n) and
rely on external programs (like $b"."xv$n and $b"."arena$n) to view graphics
and graphics-intensive pages
";
$help = qq`
NAME
MIRA - text-only, command-line oriented web browser
DESCRIPTION
$cmdhelp
For more perl goodness, go to mrob.com/pub/perl
SEE ALSO
bget - Download or stream data from a URL to a file or stdout
strip-wayback - Remove archive.org headers and links from HTML
`;
$unused_header = q`
Revision History:
19980706 Discover nested forms in Hotbot results, and trying to
submit something from the outer form that occurs after the end of the
inner form doesn't work. Fix by using shift and unshift to keep a
stack of form numbers.
19980723 Add 'gsu' command to make global history searches easier!
19980916 Add routines to save and restore the "local history stack".
(period of many undocumented changes)
19981218 Begin recording dates in global log (g_log) file
19990126 Over the last few days I have added color-coding to show
what links have been visited, and of those that have not, which are on
the same host as the current page. This aids in the (recently more
frequent) practice of manually loading everything about a given
subject just so I can have the text in the archive.
19990127 add support for viewing PDF's through gv, but gv doesn't
seem to display the PDF's properly.
19990201 #glr# command works. (#ghr# is still pending). Fix bugs in
#gsc# and add default command to #gsc# making it easy to continue the
search to older pages.
19990208 Form submissions now include values of checked "radio"
items, and recognize numerical values of "select" items; this makes
DejaNews queries work properly.
19990209 #f# command now supports a numeric argument; added #sf#
command; #bm# now prints the title and URL it's adding
19990211 Now you can type #rf# to view form fields after you've
filled them in; #m# and #vs# commands include similar changes.
19990212 Now you can type "a .." from a URL that ends in "/" and it
will go up one directory. This complments the use of "a " from URLs
that don't end in "/".
19990216 Added #ga# command allowing jump to an anchor based on text
in the anchor's label. Often useful on pages returned by search
engines, where anchors like "Next" occur with an unpredictable number
but a predictable name.
19990219 Hotbot has now set up their links to be queries to HotBot,
which return a page containing a redirected URL. Modify MIRA to handle
this by retroactively changing its internal URLs (in anc_base, stack,
global history and log) and copying the already-loaded data into
another place in the cache.
19990223 Fix a bug that made DejaNews queries not get cached
19990224 Yet another bug that made redirected pages (like HotBot
hits) not get cached. This time it's fixed a lot better, I found that
there was some confusion as to where the extra cache copy should be
created and decided to put it completely in load_n_cache.
19990301 formatting chain now generates an output file containing
plain ASCII output; #p# now has an option to save in plain ASCII
format. #p#'s (S) and (A) options append if file already exists.
19990302 #p# command switches color to magenta.
19990303 Fix some bugs in newline formatting to plain ASCII output.
19990304 Add URL and date stamp to beginning of plain ASCII output
file. #gsc# now takes a quoted argument to allow searching for strings
containing spaces.
19990308 add_anchor now gleans hrefs from within query hrefs,
essentially restoring direct links to HotBot query results pages (see
19990219). This allows the user to see if the link has been visited,
very useful when a new search returns links visited through a previous
search. Also, direct links are often faster and more reliable. Make
AREA anchors have colors just like normal anchors.
19990310 AREA labels now show part of their href text to make them
less ambiguous. Fixed bug with #f# in GHR mode.
19990312 Found a page that ends anchors with instead of .
19990325 add_anchor1 now recognizes "javascript:openWindow" in URLs.
|)
# || ($l =~ m|^ text
) {
$l = "";
$ignore = 0;
} elsif ($l =~ m|^?noframe|) {
# nothing
return;
} elsif ($l =~ m|")
|| ($l eq "")
|| ($l eq "")
|| ($l eq "