* feat: Relax URL matching
Instead of only linkifying things with an explicit http or https scheme,
the xurls.Relaxed also matches links with known TLDs. This means that
text like 'banana.com' will also be matched, despite the missing
http/https scheme. This also works to linkify email addresses, which is
handy.
This should also ensure we catch links without a scheme for the purpose
of spam checking.
* [feature] Parse content warning as HTML, serialize via API to plaintext
* tidy up some cruft
* whoops
* oops
* i'm da joker baybee
* clemency muy lorde
* rename some of the text functions for clarity
* jiggle the opts
* fiddle de deee
* hopefully the last test fix i ever have to do in my beautiful life
* [chore] Remove years from all license headers
Years or year ranges aren't required in license headers. Many projects
have removed them in recent years and it avoids a bit of yearly toil.
In many cases our copyright claim was also a bit dodgy since we added
the 2021-2023 header to files created after 2021 but you can't claim
copyright into the past that way.
* [chore] Add license header check
This ensures a license header is always added to any new file. This
avoids maintainers/reviewers needing to remember to check for and ask
for it in case a contribution doesn't include it.
* [chore] Add missing license headers
* [chore] Further updates to license header
* Use the more common // indentend comment format
* Remove the hack we had for the linter now that we use the // format
* Add SPDX license identifier
This adds a lightweight form of tracing to GTS. Each incoming request is
assigned a Request ID which we then pass on and log in all our log
lines. Any function that gets called downstream from an HTTP handler
should now emit a requestID=value pair whenever it logs something.
Co-authored-by: kim <grufwub@gmail.com>
* Implement goldmark debug print for hashtags and mentions
* Minify HTML in FromPlain
* Convert plaintext status parser to goldmark
* Move mention/tag/emoji finding logic into formatter
* Combine mention and hashtag boundary characters
* Normalize unicode when rendering hashtags
* move caption sanitization -> sanitize.go
* use sanitizeplaintext rather than removehtml
* rename sanitizecaption to sanitizeplaintext
* avoid removing html twice from statuses
* unexport remoteHTML
it's no longer used outside the text package so this
makes it less confusing
* test instance PATCH
* fix existing bio text showing as HTML
- updated replaced mentions to include instance
- strips HTML from account source note in Verify handler
- update text formatter to use buffers for string writes
Signed-off-by: kim <grufwub@gmail.com>
* go away linter
Signed-off-by: kim <grufwub@gmail.com>
* change buf reset location, change html mention tags
Signed-off-by: kim <grufwub@gmail.com>
* reduce FindLinks code complexity
Signed-off-by: kim <grufwub@gmail.com>
* fix HTML to text conversion
Signed-off-by: kim <grufwub@gmail.com>
* Update internal/regexes/regexes.go
Co-authored-by: Mina Galić <mina.galic@puppet.com>
* use improved html2text lib with more options
Signed-off-by: kim <grufwub@gmail.com>
* fix to produce actual plaintext from html
Signed-off-by: kim <grufwub@gmail.com>
* fix span tags instead written as space
Signed-off-by: kim <grufwub@gmail.com>
* performance improvements to regex replacements, fix link replace logic for un-html-ing in the future
Signed-off-by: kim <grufwub@gmail.com>
* fix tag/mention replacements to use input string, fix link replace to not include scheme
Signed-off-by: kim <grufwub@gmail.com>
* use matched input string for link replace href text
Signed-off-by: kim <grufwub@gmail.com>
* remove unused code (to appease linter :sobs:)
Signed-off-by: kim <grufwub@gmail.com>
* improve hashtagFinger regex to be more compliant
Signed-off-by: kim <grufwub@gmail.com>
* update breakReplacer to include both unix and windows line endings
Signed-off-by: kim <grufwub@gmail.com>
* add NoteRaw field to Account to store plaintext account bio, add migration for this, set for sensitive accounts
Signed-off-by: kim <grufwub@gmail.com>
* drop unnecessary code
Signed-off-by: kim <grufwub@gmail.com>
* update text package tests to fix logic changes
Signed-off-by: kim <grufwub@gmail.com>
* add raw note content testing to account update and account verify
Signed-off-by: kim <grufwub@gmail.com>
* remove unused modules
Signed-off-by: kim <grufwub@gmail.com>
* fix emoji regex
Signed-off-by: kim <grufwub@gmail.com>
* fix replacement of hashtags
Signed-off-by: kim <grufwub@gmail.com>
* update code comment
Signed-off-by: kim <grufwub@gmail.com>
Co-authored-by: Mina Galić <mina.galic@puppet.com>