class PDF::Reader::PageState
encapsulates logic for tracking graphics state as the instructions for a single page are processed. Most of the public methods correspond directly to PDF operators.
Constants
- DEFAULT_GRAPHICS_STATE
Public Class Methods
starting a new page
# File lib/pdf/reader/page_state.rb, line 23 def initialize(page) @page = page @cache = page.cache @objects = page.objects @font_stack = [build_fonts(page.fonts)] @xobject_stack = [page.xobjects] @cs_stack = [page.color_spaces] @stack = [DEFAULT_GRAPHICS_STATE.dup] state[:ctm] = identity_matrix end
Public Instance Methods
Text Object Operators
# File lib/pdf/reader/page_state.rb, line 81 def begin_text_object @text_matrix = identity_matrix @text_line_matrix = identity_matrix @font_size = nil end
This returns a deep clone of the current state, ensuring changes are keep separate from earlier states.
Marshal is used to round-trip the state through a string to easily perform the deep clone. Kinda hacky, but effective.
# File lib/pdf/reader/page_state.rb, line 282 def clone_state if @stack.empty? {} else Marshal.load Marshal.dump(@stack.last) end end
update the current transformation matrix.
If the CTM is currently undefined, just store the new values.
If there's an existing CTM, then multiply the existing matrix with the new matrix to form the updated matrix.
# File lib/pdf/reader/page_state.rb, line 63 def concatenate_matrix(a, b, c, d, e, f) if state[:ctm] ctm = state[:ctm] state[:ctm] = TransformationMatrix.new(a,b,c,d,e,f).multiply!( ctm.a, ctm.b, ctm.c, ctm.d, ctm.e, ctm.f ) else state[:ctm] = TransformationMatrix.new(a,b,c,d,e,f) end @text_rendering_matrix = nil # invalidate cached value end
transform x and y co-ordinates from the current user space to the underlying device space.
# File lib/pdf/reader/page_state.rb, line 218 def ctm_transform(x, y) [ (ctm.a * x) + (ctm.c * y) + (ctm.e), (ctm.b * x) + (ctm.d * y) + (ctm.f) ] end
# File lib/pdf/reader/page_state.rb, line 243 def current_font find_font(state[:text_font]) end
# File lib/pdf/reader/page_state.rb, line 87 def end_text_object # don't need to do anything end
# File lib/pdf/reader/page_state.rb, line 254 def find_color_space(label) dict = @cs_stack.detect { |colorspaces| colorspaces.has_key?(label) } dict ? dict[label] : nil end
# File lib/pdf/reader/page_state.rb, line 247 def find_font(label) dict = @font_stack.detect { |fonts| fonts.has_key?(label) } dict ? dict[label] : nil end
# File lib/pdf/reader/page_state.rb, line 261 def find_xobject(label) dict = @xobject_stack.detect { |xobjects| xobjects.has_key?(label) } dict ? dict[label] : nil end
# File lib/pdf/reader/page_state.rb, line 108 def font_size @font_size ||= begin _, zero = trm_transform(0,0) _, one = trm_transform(1,1) (zero - one).abs end end
XObjects
# File lib/pdf/reader/page_state.rb, line 189 def invoke_xobject(label) save_graphics_state xobject = find_xobject(label) raise MalformedPDFError, "XObject #{label} not found" if xobject.nil? matrix = xobject.hash[:Matrix] concatenate_matrix(*matrix) if matrix if xobject.hash[:Subtype] == :Form form = PDF::Reader::FormXObject.new(@page, xobject, :cache => @cache) @font_stack.unshift(form.font_objects) @xobject_stack.unshift(form.xobjects) yield form if block_given? @font_stack.shift @xobject_stack.shift else yield xobject if block_given? end restore_graphics_state end
Text Positioning Operators
# File lib/pdf/reader/page_state.rb, line 136 def move_text_position(x, y) # Td temp = TransformationMatrix.new(1, 0, 0, 1, x, y) @text_line_matrix = temp.multiply!( @text_line_matrix.a, @text_line_matrix.b, @text_line_matrix.c, @text_line_matrix.d, @text_line_matrix.e, @text_line_matrix.f ) @text_matrix = @text_line_matrix.dup @font_size = @text_rendering_matrix = nil # invalidate cached value end
# File lib/pdf/reader/page_state.rb, line 149 def move_text_position_and_set_leading(x, y) # TD set_text_leading(-1 * y) move_text_position(x, y) end
# File lib/pdf/reader/page_state.rb, line 176 def move_to_next_line_and_show_text(str) # ' move_to_start_of_next_line end
# File lib/pdf/reader/page_state.rb, line 164 def move_to_start_of_next_line # T* move_text_position(0, -state[:text_leading]) end
after each glyph is painted onto the page the text matrix must be modified. There's no defined operator for this, but depending on the use case some receivers may need to mutate the state with this while walking a page.
NOTE: some of the variable names in this method are obscure because
they mirror variable names from the PDF spec
NOTE: see Section 9.4.4, PDF 32000-1:2008, pp 252
Arguments:
w0 - the glyph width in *text space*. This generally means the width
in glyph space should be divded by 1000 before being passed to this function
tj - any kerning that should be applied to the text matrix before the
following glyph is painted. This is usually the numeric arguments in the array passed to a TJ operator
word_boundary - a boolean indicating if a word boundary was just
reached. Depending on the current state extra space may need to be added
# File lib/pdf/reader/page_state.rb, line 312 def process_glyph_displacement(w0, tj, word_boundary) fs = font_size # font size tc = state[:char_spacing] if word_boundary tw = state[:word_spacing] else tw = 0 end th = state[:h_scaling] # optimise the common path to reduce Float allocations if th == 1 && tj == 0 && tc == 0 && tw == 0 glyph_width = w0 * fs tx = glyph_width else glyph_width = ((w0 - (tj/1000.0)) * fs) * th tx = glyph_width + ((tc + tw) * th) end # TODO: I'm pretty sure that tx shouldn't need to be divided by # ctm[0] here, but this gets my tests green and I'm out of # ideas for now # TODO: support ty > 0 if ctm.a == 1 || ctm.a == 0 @text_matrix.horizontal_displacement_multiply!(tx) else @text_matrix.horizontal_displacement_multiply!(tx/ctm.a) end @font_size = @text_rendering_matrix = nil # invalidate cached value end
Restore the state to the previous value on the stack.
# File lib/pdf/reader/page_state.rb, line 48 def restore_graphics_state @stack.pop end
Clones the current graphics state and push it onto the top of the stack. Any changes that are subsequently made to the state can then by reversed by calling restore_graphics_state.
# File lib/pdf/reader/page_state.rb, line 42 def save_graphics_state @stack.push clone_state end
Text State Operators
# File lib/pdf/reader/page_state.rb, line 95 def set_character_spacing(char_spacing) state[:char_spacing] = char_spacing end
# File lib/pdf/reader/page_state.rb, line 99 def set_horizontal_text_scaling(h_scaling) state[:h_scaling] = h_scaling / 100.0 end
# File lib/pdf/reader/page_state.rb, line 180 def set_spacing_next_line_show_text(aw, ac, string) # " set_word_spacing(aw) set_character_spacing(ac) move_to_next_line_and_show_text(string) end
# File lib/pdf/reader/page_state.rb, line 103 def set_text_font_and_size(label, size) state[:text_font] = label state[:text_font_size] = size end
# File lib/pdf/reader/page_state.rb, line 116 def set_text_leading(leading) state[:text_leading] = leading end
# File lib/pdf/reader/page_state.rb, line 154 def set_text_matrix_and_text_line_matrix(a, b, c, d, e, f) # Tm @text_matrix = TransformationMatrix.new( a, b, c, d, e, f ) @text_line_matrix = @text_matrix.dup @font_size = @text_rendering_matrix = nil # invalidate cached value end
# File lib/pdf/reader/page_state.rb, line 120 def set_text_rendering_mode(mode) state[:text_mode] = mode end
# File lib/pdf/reader/page_state.rb, line 124 def set_text_rise(rise) state[:text_rise] = rise end
# File lib/pdf/reader/page_state.rb, line 128 def set_word_spacing(word_spacing) state[:word_spacing] = word_spacing end
Text Showing Operators
# File lib/pdf/reader/page_state.rb, line 172 def show_text_with_positioning(params) # TJ # TODO record position changes in state here end
when #save_graphics_state is called, we need to push a new copy of the current state onto the stack. That way any modifications to the state will be undone once #restore_graphics_state is called.
# File lib/pdf/reader/page_state.rb, line 272 def stack_depth @stack.size end
transform x and y co-ordinates from the current text space to the underlying device space.
transforming (0,0) is a really common case, so optimise for it to avoid unnecessary object allocations
# File lib/pdf/reader/page_state.rb, line 231 def trm_transform(x, y) trm = text_rendering_matrix if x == 0 && y == 0 [trm.e, trm.f] else [ (trm.a * x) + (trm.c * y) + (trm.e), (trm.b * x) + (trm.d * y) + (trm.f) ] end end
Private Instance Methods
return the current transformation matrix
# File lib/pdf/reader/page_state.rb, line 370 def ctm state[:ctm] end
This class uses 3x3 matrices to represent geometric transformations These matrices are represented by arrays with 9 elements The array [a,b,c,d,e,f,g,h,i] would represent a matrix like:
a b c d e f g h i
# File lib/pdf/reader/page_state.rb, line 399 def identity_matrix TransformationMatrix.new(1, 0, 0, 1, 0, 0) end
# File lib/pdf/reader/page_state.rb, line 374 def state @stack.last end
used for many and varied text positioning calculations. We potentially need to access the results of this method many times when working with text, so memoize it
# File lib/pdf/reader/page_state.rb, line 348 def text_rendering_matrix @text_rendering_matrix ||= begin state_matrix = TransformationMatrix.new( state[:text_font_size] * state[:h_scaling], 0, 0, state[:text_font_size], 0, state[:text_rise] ) state_matrix.multiply!( @text_matrix.a, @text_matrix.b, @text_matrix.c, @text_matrix.d, @text_matrix.e, @text_matrix.f ) state_matrix.multiply!( ctm.a, ctm.b, ctm.c, ctm.d, ctm.e, ctm.f ) end end