suggestion : counting chars vs. counting bytes

```
"
awk 'length($1) < 6' table.txt
echo 'αλεπού' | awk '{print length()}'
echo 'αλεπού' | awk -b '{print length()}'
echo 'αλεπού' | LC_ALL=C awk '{print length()}'"
```

one doesn't need to use `LC_ALL=C `or activate byte mode `-b` just to count exact bytes of the input.

even in `gawk unicode mode,` use 

```

- length(str)  

  to count UTF8 characters, and 
  
- match(str, /$/) - 1 

   to count bytes

```

Why that works is that the code is requesting a match of the empty string at the tail, but since no other characters were matched along the way, it defaults to reporting back to you a  byte count. The minus 1 is _essential_ because otherwise `RSTART` would be at 1 virtual byte beyond the input string.

You can directly throw binary files like `.MP3 .MP4 .XZ .PNG` and **gawk** unicode mode would give you the byte count, without any error messages 

That said, only the `match( )` one won't give error messages if you throw binary data at gawk unicode mode, `length( )` will DEFINITELY scream, as well as `match(str /.$/)` 

1. (note the dot `.`  right before `$` - on valid UTF8 inputs, this function call style is equivalent to `length( )`, but on random bytes, it will DEFINITELY give you the locale error message )

(can't use this to circumvent `length( )`'s error message if it's pure binary input - one needs to code up an alternative approach to count it, e.g. via `gsub( )` 

Took me a while to code it up myself , but now i could get byte-mode to count UTF8, and get unicode mode to directly take in binary data, and have it report an identical count to gnu-wc) 






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggestion : counting chars vs. counting bytes #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

suggestion : counting chars vs. counting bytes #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions