Author Topic: Tokenize a line of text (Read 54114 times)

franji1 · « **on:** February 16, 2021, 12:03:29 PM »

This sample project was written in 2.5 for the Simulator. It has a task called Tokenize that takes a CSV line of real numbers separated by commas, parses them out into a block of strings called Tokens (so Tokens0, Tokens1, Tokens2, etc.), then it parses those strings in the Tokens block in a task called ParseTokens, and creates the equivalent block of real values in ParsedData (ParsedData0 real value corresponds to the text in Tokens0, ParsedData1 real value corresponds to the text in Tokens1, etc.). It can handle up to 100 values in a single line (so Tokens is a block of STRINGs 100 long, ParsedData is a block of REALs 100 long).

D101 contains the parsed number of tokens from the source Text line (SL0).

Note that your data does not have to be numeric - it can be text or a mixture. You would need to tweak the ParseTokens task to do what you actually need to process for each text string Tokens0, Tokens1, Tokens2, etc. extracted from the original Text string SL0.

EDIT: Tokenize logic does NOT need to be changed if the tokens are text or combination of texts and numeric, as long as the tokens are separated by commas.

EDIT2: D101 is CALCULATED. It is an OUTPUT of the Tokenize task. So if you ENTASK ParseTokens with "12,32,88" in SL0, D101 will be set to 3. If you ENTASK ParseTokens with "14,8,3,10,5,28,18,39" in SL0, D101 will be set to 8.

Attached is a screen shot showing a sample run with the Text SL0 equal to "3,1,4,1,5,9,2,6,5", showing the Tokens elements, D101 (NumTokens), and the ParsedData values in Data Views.

Author Topic: Tokenize a line of text (Read 54114 times)

franji1

Tokenize a line of text