RegEx, Data Classes and Type Hints with Python: Learning from tweet text

Let’s use the RE module and Data Classes to check if a given tweet text is valid to our fictional business rule. Type hints can help with gradual typing. Come!

An image with the title “RegEx, Data Classes and Type Hints with Python, learning from tweet text”.
  • After the mention, the word “concedes” must be present. It is identified as the trigger keyword.
  • After the keyword, at least one hashtag must be present. The tweet may contain more than one if the user wishes to.
  • The hashtag must be written using Upper Camel Case (also known as Pascal Case) to identify as separate words to create slugs. For instance, #VillainJafar turns into villain-jafar.

Starting from the test

It’s pretty typical to start from the tests when you have a well-defined business rule. This approach of development is known as TDD. If we consider the sample given with #VillainJafar:

def test_should_evaluate_text_as_valid_given_keyword_and_one_hashtag_presence():
# Arrange
sample_tweet = "@GenieOfTheLamp concedes #VillainJafar"
# Act
result = None
# Assert
assert result.is_valid
assert result.hashtags == ["VillainJafar"]
assert result.slugs == ["villain-jafar"]
def test_should_evaluate_text_as_valid_given_keyword_and_two_hashtags_presence():
# Arrange
sample_tweet = "@GenieOfTheLamp concedes #FirstWish #SecondWish"
# Act
result = None
# Assert
assert result.is_valid
assert result.hashtags == ["FirstWish", "SecondWish"]
assert result.slugs == ["first-wish", "second-wish"]
  • The text is missing the keyword.
  • The text has the wrong keyword.
  • The text is none or empty.
def test_should_evaluate_text_as_invalid_given_missing_keyword():
# Arrange
sample_tweet = "@GenieOfTheLamp #FirstWish #SecondWish"
# Act
result = None
# Assert
assert not result.is_valid
def test_should_evaluate_text_as_invalid_given_wrong_keyword():
# Arrange
sample_tweet = "@GenieOfTheLamp creates #FirstWish #SecondWish"
# Act
result = None
# Assert
assert not result.is_valid
def test_should_throw_exception_when_text_is_none_or_empty():
# Arrange
sample_tweet = "@GenieOfTheLamp creates #FirstWish #SecondWish"
# Act and assert
with pytest.raises(Exception):
# Empty String case
pass
with pytest.raises(Exception):
# None as an argument
pass

Defining method contract using type hints and data classes

As you’ve seen in our tests, we didn’t define our method contract. Let’s start with the following:

def check_text_and_grab_its_details(text: str):
pass
@dataclass(frozen=True)
class TextDetails:
is_valid: bool
hashtags: Optional[List[str]] = None
slugs: Optional[List[str]] = None
def check_text_and_grab_its_details(text: str) -> TextDetails:
pass
TextDetails(True)
TextDetails(True, ["FirstWish"], ["first-wish"])

Method implementation and the RE module

To have a valid regular expression, I used the site RegExr to create it. If you don’t know how it works, I recommend the book Piazinho. The regex pattern can be defined in our Python code like the following:

regex_valid_text = re.compile(r".* ?@GenieOfTheLamp concedes ( ?#([a-zA-Z]{1,}))+$")
@dataclass(frozen=True)
class TextDetails:
is_valid: bool
hashtags: Optional[List[str]] = None
slugs: Optional[List[str]] = None
regex_valid_text = re.compile(r".* ?@GenieOfTheLamp concedes ( ?#([a-zA-Z]{1,}))+$")
regex_camel_case_conversion = re.compile(r"(?!^)([A-Z]+)")
def check_text_and_grab_its_details(text: str) -> TextDetails:
cleaned_text = strip_left_and_right_sides(text)
if not cleaned_text:
raise TextIsFalsyException
@dataclass(frozen=True)
class TextDetails:
is_valid: bool
hashtags: Optional[List[str]] = None
slugs: Optional[List[str]] = None
pattern_valid_text = re.compile(r".* ?@GenieOfTheLamp concedes (#[a-zA-Z]{1,} ?)+$")
pattern_hashtags = re.compile(r"(#[a-zA-Z]{1,})")
pattern_camel_case_conversion = re.compile(r"(?!^)([A-Z]+)")
def check_text_and_grab_its_details(text: str) -> TextDetails:
cleaned_text = strip_left_and_right_sides(text)
if not cleaned_text:
raise TextIsFalsyException
match = pattern_valid_text.match(text)if not match:
return TextDetails(False)
all_hashtags = pattern_hashtags.findall(text)hashtags = []
slugs = []
for hashtag in all_hashtags:
tag = hashtag.replace("#", "")
almost_slug = pattern_camel_case_conversion.sub(r"-\1", tag)
slug = almost_slug.lower()
hashtags.append(tag)
slugs.append(slug)
return TextDetails(True, hashtags, slugs)

Running the tests

We just need to replace the following snippets:

# Act
result = None
# Act and assert
with pytest.raises(Exception):
# Empty String case
pass
with pytest.raises(Exception):
# None as an argument
pass
# Act
result = check_text_and_grab_its_details(sample_tweet)
# Act and assert
with pytest.raises(TextIsFalsyException):
check_text_and_grab_its_details("")
with pytest.raises(AttributeError):
check_text_and_grab_its_details(None)
It shows a list of tests containing 5 test cases. All of them are running successfully.

Bonus: static analysis with mypy

Nowadays, a production-ready Python project must have a static type checker. You can gradually type your project, then make it safer to work with and ship to your environment. One that you can use is mypy; what is it according to the documentation:

#!/usr/bin/env bashTARGET_PROJECT=regex_dataclasses
TARGET_TEST_PROJECT=tests
TARGET_FOLDERS="$TARGET_PROJECT $TARGET_TEST_PROJECT"
echo "######## ISORT..."
isort $TARGET_FOLDERS --check-only --diff
echo "######## BLACK..."
black --check --diff $TARGET_FOLDERS
echo "######## MYPY..."
# mypy will only target the project folder
mypy $TARGET_PROJECT
▶ ./scripts/start-lint.sh
######## ISORT...
######## BLACK...
All done! ✨ 🍰 ✨
7 files would be left unchanged.
######## MYPY...
Success: no issues found in 4 source files
(regex-dataclasses-with-python-learning-from-tweet-text)

Conclusion

If you’ve been working with strong typing languages and dynamic ones for a time, once you start gradually typing projects, you’ll notice how fast you can produce and deliver good quality code. Instead of typing everything, you can create types and apply them to important places on your code. It’s been some years that I understood that 100% of pure dynamic code or typed one is a bad thing, depending, of course, in which context you are. Static type checkers enable Python projects to support gradual typing for our luck, and data classes are a fantastic way to help us with it. This technique must be used wisely, or more problems are brought up actually, though.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store