Formatting tweets: a look at Extended tweets, Retweets and Quotes
One thing I’ve noticed on thefeed.press is that the conversations (the tweets) surrounding shared links are sometimes more interesting than the link. To place proper emphasis on these tweets mean displaying them wherever necessary; the email digest for example. And displaying them mean formatting them properly.
To display a tweet properly, it needs to be well formatted. This means identifying and linking entities like usernames, hashtags and URLs. In simple terms, it is converting a typical tweet object1 like this:
Wish I have some time to curate #WeAreNigerianCreatives. Someone please do.
Notice that the tweet object’s
text is plain unformatted text but there is an additional
entities object with necessary details for formatting. You probably won’t need to write a library to match and replace the entities in the text though. Twitter provides Twitter Text, an amazing library to do this.
This is a representation in Node.js.
Say hello to extended tweets
For tweets over 140 characters, the tweet object only returns 140 characters of text by default. In this compatibility mode,
textis truncated to 140 characters
truncatedis set to
truefor tweets that are more than 140 characters
entitiesonly include those in the available 140 text range
Here is an example tweet object
Formatting that will give this:
I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend… https://twitter.com/i/web/status/972535628742078469 …
compared to the original tweet:
I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend 20 minutes learning X today,” and have to invest an additional 60 minutes just setting up the appropriate environment.
How to get the full text? Simple. Add the parameter
tweet_mode=extended to any endpoint you are querying. So instead of
https://api.twitter.com/1.1/statuses/show/972535628742078469.json, let’s try
Yeah, that simple. Notice that:
display_text_rangeidentifies the start and end of the displayable content of the tweet.
You can then go ahead and format using
Here is a retweet requested in extended mode.
full_text is truncated even though
false. What could be wrong? Well, texts in retweets are prefixed with
RT @username: and if the resulting text is more than 140 characters, it will be truncated.
What to do? Use the
retweeted_status instead. The
retweeted_status object contains the full text and entities you need.
Just check if
retweeted_status exist and use that instead.
Quotes are in an entirely different world of their own. You need to see what a quoted tweet looks like to understand.
full_text does not tell the complete story. It does not include the tweet that was quoted. The quoted tweet is hidden somewhere in
quoted_status. And unlike retweets where you can replace the tweet with the retweeted status, you need both the original and additional tweet to make complete sense of a quote. Here is what
quoted_status looks like:
So what do we do in this case? What we need to achieve is something like this:
Added tweets to the daily newsletter for better context
And it seems we just need to format the quoted tweet and additional tweet separately and show them together.
Added tweets to the daily newsletter for better context. https://twitter.com/thefeedpress/status/941880801087680512 …
Looks pretty close. But the additional tweet has a link to the embedded quote. Can we remove this link though? Let’s try.
Since we know the link to the quoted status will always end the additional tweet text, we can match end of text for link with format
https://twitter.com/[quoted_status_user_username]/status/[0-9]+ and remove. There are a couple of issues with this though. If we match the unformatted text, the url will still be in the format
http://t.co/\w+ (unexpanded) and not
https://twitter.com/[quoted_status_user_username]/status/[0-9]+ (expanded). If we match after formatting, the link would have been expanded but will contain HTML tags that will break our regular expression2.
Well, since we know the link will always end the text, we can remove any ending link in the unformatted text. We can also remove the index from the entities before we then proceed to format the text.
This is all you will probably need. But there is still more to do. What about displaying media (pictures, videos) within the tweet? Quotes within quotes? Threaded replies?
If you really want to do it, formatting tweets can be a complex thing. But you really don’t have to do it if not necessary. You can use embedded tweets instead.
Some items are removed from the tweet object as well as others used in this piece for brevity purpose. ↩
Here is what the formatted HTML for the link
<a href="https://t.co/Q46O3husnz" title="https://twitter.com/thefeedpress/status/941880801087680512" rel="nofollow"><span class='tco-ellipsis'><span style='position:absolute;left:-9999px;'> </span></span><span style='position:absolute;left:-9999px;'>https://</span><span class='js-display-url'>twitter.com/thefeedpress/s</span><span style='position:absolute;left:-9999px;'>tatus/941880801087680512</span><span class='tco-ellipsis'><span style='position:absolute;left:-9999px;'> </span>…</span></a>↩