Regular Expression Decoding
Regular expressions are great. They can be a quick and easy way to extract substrings of interest from a string. I use the awesome hyper
crate in my day job to serve up a web API and I use the hyper-router
crate to define all of the routes for the web API using regular expressions. Often times, those regular expressions are used something like this (stub code borrowed from hyper-router):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
That code isn’t too terrible, but it is ignoring a lot of error handling for extracting the mac and version strings from the regular expression, as well as error code for converting that version string to an actual number. Now, in this case, the unwraps are probably okay since:
- We had to match against the regular expression for the
endpoint_handler
to be called, so we should successfully get a captures object from applying the regex to the uri. - Both mac and version will exist.
- The version string will be a number that fits in a u64 because of the regex.
However, the code gets very repetitive. What if we could just do something like this?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
That is what the regex-decode crate does for you. It provides the ability to decode the named capture groups of a regular expression into a struct. Additionally, it will do type conversions from strings into other primitive types that are defined in the struct. So, in the above example, it hides away the boilerplate of extracting the named captures and the type conversion and gives you a single place to check for errors, instead of every step along the way.
If defining a struct is too heavy-weight for you, just have it decode to a tuple:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
In this case, we are not decoding by the names of the capture groups and are instead just grabbing the captures by position. Plus, that still gives you the type safety that the struct would give you.
You should be able to use any type you want as long as it is RustcDecodable
. However, recursive types do not make a whole lot of sense while you are parsing a regular expression, so those are not handled. Also, I have not figured out a good way to decode tuple structs yet, so the library will emit an error if you are trying to decode to a tuple struct. But, for the use case of quickly decoding to a simple struct, the crate works just fine.