Formatting tweets: a look at Extended tweets, Retweets and Quotes
One thing I’ve noticed on thefeed.press is that the conversations (the tweets) surrounding shared links are sometimes more interesting than the link. To place proper emphasis on these tweets mean displaying them wherever necessary; the email digest for example. And displaying them mean formatting them properly.
Introduction
To display a tweet properly, it needs to be well formatted. This means identifying and linking entities like usernames, hashtags and URLs. In simple terms, it is converting a typical tweet object1 like this:
to this:
Wish I have some time to curate #WeAreNigerianCreatives. Someone please do.
Notice that the tweet object’s text
is plain unformatted text but there is an additional entities
object with necessary details for formatting. You probably won’t need to write a library to match and replace the entities in the text though. Twitter provides Twitter Text, an amazing library to do this.
This is a representation in Node.js.
Say hello to extended tweets
For tweets over 140 characters, the tweet object only returns 140 characters of text by default. In this compatibility mode,
text
is truncated to 140 characterstruncated
is set totrue
for tweets that are more than 140 charactersentities
only include those in the available 140 text range
Here is an example tweet object
Formatting that will give this:
I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend… https://twitter.com/i/web/status/972535628742078469 …
compared to the original tweet:
I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend 20 minutes learning X today,” and have to invest an additional 60 minutes just setting up the appropriate environment.
Mode: Extended
How to get the full text? Simple. Add the parameter tweet_mode=extended
to any endpoint you are querying. So instead of https://api.twitter.com/1.1/statuses/show/972535628742078469.json
, let’s try https://api.twitter.com/1.1/statuses/show/972535628742078469.json?tweet_mode=extended
Yeah, that simple. Notice that:
full_text
replacestext
truncated
isfalse
display_text_range
identifies the start and end of the displayable content of the tweet.
You can then go ahead and format using full_text
and entities
.
Hmmm…retweets
Here is a retweet requested in extended mode.
Notice how full_text
is truncated even though truncated
says false
. What could be wrong? Well, texts in retweets are prefixed with RT @username:
and if the resulting text is more than 140 characters, it will be truncated.
What to do? Use the retweeted_status
instead. The retweeted_status
object contains the full text and entities you need.
Just check if retweeted_status
exist and use that instead.
Quotes :/
Quotes are in an entirely different world of their own. You need to see what a quoted tweet looks like to understand.
The full_text
does not tell the complete story. It does not include the tweet that was quoted. The quoted tweet is hidden somewhere in quoted_status
. And unlike retweets where you can replace the tweet with the retweeted status, you need both the original and additional tweet to make complete sense of a quote. Here is what quoted_status
looks like:
So what do we do in this case? What we need to achieve is something like this:
Added tweets to the daily newsletter for better context
@thefeedpress:
New newsletter screenshot pic.twitter.com/HQmJumZfhN
And it seems we just need to format the quoted tweet and additional tweet separately and show them together.
Added tweets to the daily newsletter for better context. https://twitter.com/thefeedpress/status/941880801087680512 …
@thefeedpress:
New newsletter screenshot pic.twitter.com/HQmJumZfhN
Looks pretty close. But the additional tweet has a link to the embedded quote. Can we remove this link though? Let’s try.
Since we know the link to the quoted status will always end the additional tweet text, we can match end of text for link with format https://twitter.com/[quoted_status_user_username]/status/[0-9]+
and remove. There are a couple of issues with this though. If we match the unformatted text, the url will still be in the format http://t.co/\w+
(unexpanded) and not https://twitter.com/[quoted_status_user_username]/status/[0-9]+
(expanded). If we match after formatting, the link would have been expanded but will contain HTML tags that will break our regular expression2.
Well, since we know the link will always end the text, we can remove any ending link in the unformatted text. We can also remove the index from the entities before we then proceed to format the text.
Conclusion
This is all you will probably need. But there is still more to do. What about displaying media (pictures, videos) within the tweet? Quotes within quotes? Threaded replies?
If you really want to do it, formatting tweets can be a complex thing. But you really don’t have to do it if not necessary. You can use embedded tweets instead.
-
Some items are removed from the tweet object as well as others used in this piece for brevity purpose. ↩
-
Here is what the formatted HTML for the link
https://twitter.com/thefeedpress/status/941880801087680512
looks like<a href="https://t.co/Q46O3husnz" title="https://twitter.com/thefeedpress/status/941880801087680512" rel="nofollow"><span class='tco-ellipsis'><span style='position:absolute;left:-9999px;'> </span></span><span style='position:absolute;left:-9999px;'>https://</span><span class='js-display-url'>twitter.com/thefeedpress/s</span><span style='position:absolute;left:-9999px;'>tatus/941880801087680512</span><span class='tco-ellipsis'><span style='position:absolute;left:-9999px;'> </span>…</span></a>
↩