This is based on my answer to another question on Stackoverflow.com. That answer works, but is made cumbersome by the fact that I ignored Win32::OLE::Enum completely and littered the code with unnecessary loop variables.
Well, such is life when you trying to decipher how interpret the structure of a PowerPoint slide.
Slides contain shapes. Shapes can be of various types. For our purposes, we are interested in shapes for which HasTextFrame
property is true. A TextFrame
has a TextRange
. You can access the TextRange
by characters, lines or paragraphs. For our purposes, accessing by paragraphs is the right thing to do.
Once we have a paragraph, we ship it off to the print_par
routine. The code in this example prints non-bullet text as well, but treats bullet text specially. We first use the Bullet
property of the ParagraphFormat
associated with the paragraph at hand. In addition to the ppBulletNumbered
and ppBulletUnnumbered
types, there is also ppBulletNone
, ppBulletMixed
, and ppBulletPicture
bullet types. Finally, there are a bazillion numbered bullet styles (see PpNumberedBulletStyle
) which you should take into account if you care about such things.
Here is the script
#!/usr/bin/perl
use strict; use warnings;
use Try::Tiny;
use Win32::OLE;
use Win32::OLE::Const qw( Microsoft.PowerPoint );
use Win32::OLE::Enum;
$Win32::OLE::Warn = 3;
my $ppt = get_ppt();
binmode STDOUT, ':utf8';
my $presentation = $ppt->Presentations->Open('test.ppt', 1);
my $slides = Win32::OLE::Enum->new( $presentation->Slides );
SLIDE:
while ( my $slide = $slides->Next ) {
my $name = $slide->Name;
printf "=== Begin slide: %s ===\n", $name;
my $shapes = Win32::OLE::Enum->new( $slide->Shapes );
SHAPE:
while ( my $shape = $shapes->Next ) {
next SHAPE unless $shape->HasTextFrame;
my $pars = Win32::OLE::Enum->new(
$shape->TextFrame->TextRange->Paragraphs
);
PARAGRAPH:
while ( my $par = $pars->Next ) {
print_par( $par );
}
}
printf "=== End slide: %s ===\n\n", $name;
}
$presentation->Close;
sub print_par {
my $par = shift;
my $indent = $par->IndentLevel;
my $bformat = $par->ParagraphFormat->Bullet;
my $btype = $bformat->Type;
my $bchar;
# see also PpNumberedBulletStyle
$bchar = $btype == ppBulletNumbered ? $bformat->Number
: $btype == ppBulletUnnumbered ? chr $bformat->Character
: $btype == ppBulletMixed ? '[X]'
: $btype == ppBulletPicture ? '[IMG]'
: '';
my $text = $par->Text;
$text =~ s/\s+$//;
print(
"\t" x ($indent - 1),
$bchar ? ($bchar, ' ') : '',
$text,
"\n",
);
}
sub get_ppt {
my $ppt;
try { $ppt = Win32::OLE->GetActiveObject('PowerPoint.Application') }
catch { die $_ }
;
unless ( $ppt ) {
$ppt = Win32::OLE->new(
'PowerPoint.Application', sub { $_[0]->Quit }
) or die sprintf(
'Cannot start PowerPoint: %s', Win32::OLE->LastError
);
}
return $ppt;
}
And here is sample output from using with a very simple PowerPoint presentation:
=== Begin slide: Slide1 ===
This is a test presentation
subtitle
=== End slide: Slide1 ===
=== Begin slide: Slide2 ===
A bullet list
This is not a bullet
• Ya da
– Da da
– Ga ga
• Du da
1 Nu da
[IMG] Do da
=== End slide: Slide2 ===
=== Begin slide: Slide3 ===
A numbered list
1 One
1 One a
2 One b
2 Two
1 Two I
2 Two II
=== End slide: Slide3 ===