HomePhorge

(PUP-2411) Make Program/Locator contain byte vs char offset encoding
2c56f5e80924Unpublished

Unpublished Commit ยท Learn More

Repository Importing: This repository is still importing.

Description

(PUP-2411) Make Program/Locator contain byte vs char offset encoding

As discussed in the ticket. Ruby before 1.8.7 has no way to efficiently
scan a string by character (it is done per byte). In Ruby 2.0.0 the
StringScanner can respond with character positions instead of byte
positions. It is however slower than using the byte positions.

To make 1.8.7 or 1.9.3 encode strings in chars would be prohibitively
slow (a string has to be constructed to be able to know its length for
every token). In Ruby 2.0.0 it is more or less a wash (they byte way of
calculating this is however still faster (when last measured).

This fix makes it possible to record the information in the model if the
offsets in the model are byte or char based. Currently, there is no
implementation that uses char based, so this is preparation for the
future and making the AST model's API more stable.

As a comparison on other platforms (JVM) it is far easier and more
efficient to get the char positions than the byte positions).

Details

Provenance
Henrik Lindberg <henrik.lindberg@cloudsmith.com>Authored on
vanmeeuwenPushed on Jun 2 2015, 2:22 PM
Parents
rPU3c663009ec44: Merge branch 'master' into puppet-4
Branches
Unknown
Tags
Unknown

Event Timeline

Henrik Lindberg <henrik.lindberg@cloudsmith.com> committed rPU2c56f5e80924: (PUP-2411) Make Program/Locator contain byte vs char offset encoding (authored by Henrik Lindberg <henrik.lindberg@cloudsmith.com>).Sep 4 2014, 12:19 AM