use npbtools;
my $gamenum = shift;
my $dirname = shift;
my $maxinnings = shift || 9;
for (my $inning = 1; $inning for (my $team = 1; $team for (my $pos = 1; $pos
my $browser = LWP::UserAgent->new;
my $realinn = sprintf "%02d", $inning;
my $realpos = sprintf "%02d", $pos;
my $boxurl = "http://baseball.yahoo.co.jp/npb/live\?id\=$gamenum\&key\=$realinn\_$team\_$realpos";
my $boxpage = $browser->get($boxurl);
next
if $boxpage->code eq 404;
my $boxscore = $boxpage->content;
my $outfname = "$dirname/$gamenum\_$realinn\_$team\_$realpos.html";
my @outlines = ($boxscore);
print "processing $outfname\n";
filePut ($outfname, \@outlines);
}
}
}
This is a site about Pro Yakyu (Japanese Baseball), not about who the next player to go over to MLB is. It's a community of Pro Yakyu fans who have come together to share their knowledge and opinions with the world. It's a place to follow teams and individuals playing baseball in Japan (and Asia), and to learn about Japanese (and Asian) culture through baseball.
It is my sincere hope that once you learn a bit about what we're about here that you will join the community of contributors.
Michael Westbay
(aka westbaystars)
Founder
Since the start of the season I have been downloading the individual plate appearance pages (such as http://baseball.yahoo.co.jp/npb/live?id=2008032001&key=01_1_01). I have been doing this by
creating a list of all possible urls for the date (within reason)by counting the ".." from the boxscores on this site and then using a download manager to download them.I tried writing a script to automate the downloading of the files. However, I have no knowledge of any programming language, so---while being a learning experience---it has been unsuccessful in producing anything useful.
Since the url for each plate appearance has the same format http://baseball.yahoo.co.jp/npb/live?id=2008".@month.@day.@game."&key=".@inning."_".@frame."_".@batter"; .
So something such as the below works in printing the urls:
#!/usr/bin/perl
use LWP 5.64;
@batter = (1..9);
for (my $i = 0; $i < @batter; $i++) {
print "http://baseball.yahoo.co.jp/npb/live?id==2008032001&key=01_1_0$batter[$ii]\n";
}
I have one of these for each array; however, I have no idea how to combine them. I also have no idea how to pad the arrays for numbers greater than 9. Finally, this only prints the urls but does not get them. This one gets the pages:
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
my $i;
my @batter = ('1', '2', '3');
my $page = ("http://baseball.yahoo.co.jp/npb/live?id=2008032001&key=01_1_0".@batter."");
foreach $i (@batter)
{
getprint($page);
}
However, it only works if I only list pages that exist. Lastly, I have this one which asks for user input for an array value (game) and then sees if the url exists:
#!/usr/bin/perl
use LWP 5.64;
my $browser = LWP::UserAgent->new;
$browser->timeout(10);
print "game? "; # Ask for input
$a =
chop $a; # Remove the newline at end
my $url = 'http://baseball.yahoo.co.jp/npb/live?id=200803200' . $a . '&key=01_1_01';
my $response = $browser->get( $url );
if($response->is_success) {
print "Exists -- $url";
} else {
print "Does not exist -- $url";
}
It seems as though I have the pieces to make something work, I just cannot put them together. Any help would be appreciated.
Applications
Despite my troubles in downloading the pages, I have started to examine the data in some detail. An application of the data is the creation of "tendency" pages. For example here are "tendency" pages for batters (Norihiro Akahoshi), pitchers (Tetsuya Utsumi).
The top section (orange/red) shows basic player information and data totals. For instance Akahoshi's page shows that in his 259 plate appearances he has faced 1728 pitches for an average of 6.672 per plate appearance. Utsumi has thrown an average of 6.159 pitches per batter.
The second section shows the pitches faced or used in each count "class" by how much they favor each player. The numbers were from this comment based on Linear Weights. The table below shows the LWTS by count and the classification of each count:
The pitch types are (in order top-bottom); Straight, Curve, Forkball, Slider, Shuuto, Sinker, Changeup, Cutter, Special (may be any), and unknown. Akahoshi faces alot of straights and sliders (combined ~75%), although on good hitters' counts the pitches tend to be grooved in with straights. Utsumi is primarily a three pitch pitcher (straight, slider, changeup) who uses his off-speed pitches in the middle of at-bats.
The next four sections are split into "versus left" and "versus right".
The first section ("Stats") is blank and I am not sure if it should be kept as these stats are available everywhere. The next section, "Zones" shows zonal tendencies. For pitchers the "PIT#" column shows where a batter would be standing. So we can see that Utsumi likes to throw low and away to left-handed batters as well as right-handed batters, but also throws low and inside to righties. PIT# is the number of pitches ans the op number under zone indicates the percentage of pitches inside the strike-zone; the other percentage is outside. The last row shows the percentage of pitches that resulted in strikes or fouls (str.+foul), balls or passed balls (ball+PB), and everything else (other) such as hits, in-play outs, etc. For batters, the section is the same except for to "location" of where the batter is standing, for instance since Akahoshi is a left-hander we can see that he tend to be pitched to outside by both lefties and righties. Left handers tend to throw more off the lower outside corner than right-handers, who throw low and low-inside a bit more---likely mostly from the number of sliders that he sees. Right-handers also throw high and outside a bit more. I think I have batter handedness by pitch recorded, although for the most part it would seem batters would go for the platoon split.
The "Spray" section shows were batted balls ended up on a poor representation of a field. The four cell positions show (from top-left, clockwise) total batted balls, grounders, line drives, and fly balls. The positions are typical, although the shortstop is above the third-baseman, with green representing infielders, pitchers and catchers, blue is for outfielders, red is for home runs, and foul outs are in the orange section near the guide. The totals-by section are shaded by frequency. Akahoshi is a push-hitter, especially against lefties, while Utsumi gets righties to pull the ball. The surrounding cells in the corners show the total number of batted balls of each type, and the last row shows batted balls by field-thirds: left (left field, 3rd base, shortstop), center (center field, pitcher, catcher), and right (right field, 2nd base, 1st base). [The "created by" section should be vertically aligned]
The last section, "Type", shows the results of plays ("H"its or "O"uts) by location and batted ball type. The sections are the same as the field thirs although pitchers and catchers are not included. The batted ball types are grounders (G), line drives (L), and fly balls (F). The final row shows the player's Groundball to Flyball Ratio (GB/FB), the percentage of flyballs that are infield flies (IF/F), and percentage of outfield flyballs that are home runs (HR/OF). The section shows that Akahoshi tends to push balls low for grounders and line drives, and is a ground ball hitter. Akahoshi is a push-hitter, especially against lefties, while Utsumi is a groundball-pitcher, especially versus lefties.
The data can also used to determine Win Probability Added as seen here.
Also does anyone have a yahooNPB to WestbayID table?
Thanks.
Michael Eng