Meta-Press.es

Decentralized search engine & automatized press reviews

Version 1.8.17.4 : Mozilla quality enforcement, manifest v3 and Scrutari

Things have been rushed a little bit for Meta-Press.es in mid-septembre with a series of releases in close proximity responding to the urgency that represented the disabling of Meta-Press.es by Mozilla’s due to automatic enforcement of new code quality rules.

1. Mozilla automated add-ons code quality checks

Everything began with an email received the 17th of September 2024 warning me that Meta-Press.es would soon be disabled from Addons.Mozilla.org (aka AMO). Apparently I did miss a previous email allegedly sent two weeks before (but I rarely miss an email).

1.1. Missing sources or instructions

The reason invoked for this radical measure was that the Meta-Press.es WebExtension was missing « sources or instructions » on how to get the original sources of embedded dependencies.

When using embedded minified third party libraries the rule was to provide also the link to the source of this library. So, for each release since 5 years, I provided (copy/pasted from the README file) the list of official websites for every dependencies of Meta-Press.es.

The review history were the message appeared is presented like a chat. I got this message :

Sources, specifically Sources or instructions missing: Your add-on contains minified, concatenated or otherwise machine-generated code. You need to provide the original sources, together with instructions on how to generate the exact same code used in the add-on. Source code must be provided as an archive and uploaded using the source code upload field, which can be done during submission or on the version page in the developer hub. Instructions can be provided in a top-level README file inside the source code package or in the "Notes to Reviewers" field on the version page in the developer hub.

— Initial message from the Add-ons review Team

And answered by :

Hi, I provided links to non-minified code in the Reviewer’s notes. Did I missed one ?

— Answer from a developer by Siltaar
2024-09-19 15:09

Well, I never got any answer and the add-on were disabled 48h later. All of the previous version, since 5 years. Suddenly you don’t appear anymore in the add-ons search results and the URL of the add-on at AMO is 404.

1.2. Please provide the origin of the exact library version

I issued a new version with no minimized code, replacing 14 minimized CSS or JavaScript files by their original (and updated for the occasion) versions. After all, a libre JavaScript is not minimized. But this new release got refused, with the same previous message, plus another one.

  • Sources, specifically Third party library information: Your add-on includes a third-party library. Please provide the origin of the exact library version you were using and make sure you are using an exact copy of the original maintainer’s release version.

— Message from the Add-ons review Team

Ok, this makes sense also : I imagine that they built an automated verification of embedded third party libraries and they need us to provide exact links (while a human could have manage to deal with official landing page of each dependency before). And this time I got a link toward an online documentation about how to deal with third party libraries.

— Message from the Add-ons review Team

So I issued a new version with exact link of official versions of each libraries. If a library does not provide an online version of its release code, you can’t use it. Again, this is what it takes to setup an automated verification of third party libraries, and it’s a good point to know that there is no mysterious code in Mozilla’s Add-ons.

But CodeMirror v5 for instance, uses hundreds of files to store its development code, and compiles minified one-file releases (package in .zip files), which are simpler to load in your web pages. As it still uses plugins, with eventual dependencies (to highlight JSON and JSON errors for instance), it turned out to be a too big maintenance burden for Meta-Press.es. Now sources are added as raw text in a standard textarea. CodeMirror v6 might fit here one day.

1.3. No more one letter variables

It took a lot of work, out of schedules to address this priority. But this new release was rejected with the following message.

[…]

Your extension contains multiple parts of code with one letter variables, making the code difficult to reviews. As our policies state, that you can read at https://extensionworkshop.com/documentation/publish/add-on-policies/#submission-guidelines, code must be provided in a way that is reviewable

— Message from the Add-ons review Team

The top cut part is the copy of the previous message, repeated as a preamble each time. But then comes a new problem about a new rule with a new link to online documentation.

Well, I’ve been taught at my engineering school to use « i » as the iterator variable in for loops for instance. It looks like a common practice… To be true, I was using some other one letter variables, by convention, with the same letters everywhere in the code (for source definitions, for source keys and so on…). What was a small usage in a single place grown with the code and was spread everywhere.

With no other feedback and still in a hurry to fix things, I decided to replace all my one letter variables by trigrams or trigrams groups (src for source definition, src_key for source key…). I was not in the mood of searching by dichotomy the exact threshold of tolerance of the automated reviewing script regarding one letter variables using.

I sent 3 messages trying to get more details on the problems and explaining my moves. But the next release was rejected.

1.4. UTF-16 ranges in RegEx are considered obfuscation

At least I got a new error message :

  • Other, specifically Issue not covered by other reasons: As per our Source Code Submission guidelines, the source code code provided must be human readable.

A file subbmitted as part of the source code (js/core/source_fetching.js) is not readable. Please ensure that all files submitted as part of the source code submission are readable.

— Message from the Add-ons review team

This one was easy, I was using a big regular expression with lots of unicode (UTF-16) defined ranges (to cut words in strings, for potentially all the languages supported by unicode). Searching to fix this new issue, I discovered a way to get rid of this previous big definition work, using RegEx general category property (\p{Letter}\p{Number}) instead of my previous ranges of word-wild punctuation.

I submitted this new version the 9th of October, having a talk at OSSym24 the day after. If we are to deal with automated tests, we could hope those tests to perform quickly… but this v1.8.17.4 was finally approved on October the 16th, nearly one month after the whole story began for me.

No new feature here, but an admittedly improved code.

2. Switch to manifest version 3

Another silent modification since the previous release is the upgrade of Meta-Press.es to MV3 : the manifest version 3.

Here we talk about the file manifest.json which contain meta-data about the WebExtension, to allow the web-browser to load it. The manifest file lists, for instance, what is the name and icon of the WebExtension, which version is actually presented, what actions should be registered… If you don’t fill it correctly, the web browser won’t load the WebExtension (not knowing what to do with it).

2.1. Embrace, extend and extinguish

Back in 2017 when Meta-Press.es development started the current version for this manifest file was manifest v2. It was already a move from Mozilla toward the WebExtension norm proposed by Google, and it forced Mozilla to abandon all the work put in the developpement of the add-ons of their previous form in Firefox (XUL). To be true, I was really happy to get avoid using XUL stack when I started Meta-Press.es.

5 years later, in 2022, the main JavaScript stakeholders had work a lot to prepare an evolution of this norm to allow new usages and improve security. But not only. As per the famous Embrace, extend, and extinguish strategy developed by Microsoft to attack open standards in favor of its proprietary products, Google decided to use this coming evolution of the WebExtension norm (became a standard as per Firefox’s and Edge’s adoption) to push it’s own commercial agenda of online advertisement seller. To cut it short, with the manifest v3 as imposed by Google, there is no more possibility to code ad-blockers (like uBlock Origin).

So Mozilla got back to work and decided of a way to support the manifest v3 in Firefox, extending it to maintain the existence of ad-blockers.

Google was really disappointed and took 2 years to think about the opportunity to continue unraveling manifest v3 or not. Finally they announced a new schedule for the manifest v3 adoption in their web-browser along the year 2024. Will they really abandon to Mozilla a feature used by millions of users (9 millions if we just focus on Firefox when writing this blog post). This would be a new sensible reason to use Firefox : there should be no more ad-blockers in Chromium et al in a near future. As of the time of this writing, the up to date Chromium under Artix Linux only states that uBlock Origin will perhaps soon not be available anymore.

As Firefox can also work with manifest version 3, Meta-Press.es got upgraded to it thanks to the NGI Zero program operated by the NLnet.

2.2. Manifest v2 vs Manifest v3

The differences are not marvelous regarding our use-case.

Previous background pages, which were the way to get a script running in the browser instead of in a particular web page are replaced by official background scripts (with no more web page features) in Firefox and by service_workers in Chromium. Both are dozed after a certain time of idling so the automated searches of Meta-Press.es are not working well currently (despite using the recommended alarms API).

The permissions required by WebExtensions to run was refined and I took this opportunity to implement a feature suggested years ago : to embed the exact list of reachable sources in the manifest. Each version of Meta-Press.es now comes with it’s list of host permissions fully declared allowing to avoid asking for them later. But the optional host permission <all_urls> is still present, to allow users to add new sources by their selves.

3. No dates on results ?

What makes a type of web entries or search results directly manageable from a source in Meta-Press.es is the presence of dates on results. It our eternal quest : can we get this meta-data ? With this model, Meta-Press.es could have seamlessly extends its sources scope (and so search capabilities) from news to podcasts, agendas, videos and even jobs.

But there is also a lot of legit newspapers that are failing from presenting dates for results on their internal search engine.

In a previous blog post we seen a source were dates only appeared if results were sorted in chronological order (in La Charente Libre). But it was a lucky strike.

Here is another trick, used for some Meta-Press.es sources like RadioClassique.fr or VoxEurope.eu. It might regards only sources with illustrated results. Often, those illustrations are unitary uploaded as part of the making of the publication usually the day that the article is released. I addition, it happen that the URL of the illustrations contains this date of upload (for instance Wordpress can have this behavior).

In such cases we just have to direct the silver scissors of Meta-Press.es toward the illustration URL (via a CSS selector) and extract the date from it with a simple RegEx. Et voilà !

Asking the sources to improve their presentation rarely gives results. On the contrary Meta-Press.es already works with a thousand sources because there is no need to ask for their permission. It works despite the sources. And sometimes, even despite the sources that omit their result dates.

4. 20 new Scrutari-based sources

About sources, this new release embed its lot of novelty. 20 sources were added in Meta-Press.es for the different registered users of the Scrutari search-engine (and their different languages).

Scrutari is a libre software search engine project. It can fetch metadata from registered user-websites, create indexes and offer a feature-full web interface to search through those contents.

It was a pleasure to work with the Scrutari developer which did a great job of data presentation to facilitate the integration of the 20 Scrutari sources in Meta-Press.es. It’s the exact opposite of the previous section where Meta-Press.es had to hack the metadata out of a source. Here, Scrutari created a special Meta-Press.es profile to present its JSON API answers in the way Meta-Press.es expects them.

This new sources are opening a window on more than 70 000 documents, gathered in the Coredem.info initiative gathering 40 entities…

5. When asking for too much results

Some sources allow to set the number of excepted results in their queries. Usually 10 to 30 are safe values. Tests have been conducted with 999 and guess what, it breaks a lot of sources.

So a new source-definition notation was created to reflect those upper limits. For instance Reuters accepts to provide 99 results max, so its search URL now includes a {<100} query parameter replacement token and Meta-Press.es won’t try to fetch more results for this source, even if you ask for 5000.

Working on this subject revealed a variety of scenarios with Der Spiegel is using {<51} for instance, FAZ.net {<101} or MediHAL / Archives-Ouvertes.fr accepting only three values : 30, 50 or 100.

6. JavaScript code linting : quick-lint-js

After having lost a couple of hours again trying to get ESlint working with its new flat-file configuration, alternatives were seek and found with quick-lint-js.

It’s a mature solution, widely packaged. It’s immediate to put in action (zero configuration) and exquisitely fast when running (claiming to be 90x faster than ESlint).

quick-lint-js is revealed itself to be a tool, not another problem to solve.

7. #FixTheWorld : FranceTVInfo

To finish, here is a call for action.

FranceTVInfo is a main state owned source of information in France. It started as a state-owned official radio station with only loops of "news", then it was turned into a TV news stream (to contribute to this worst level of "journalism") and is now also a website.

It’s THE propaganda voice of the state. But still, it can’t be added to Meta-Press.es because there is no date on search results.

If you, reading this, can whisper to the right ears, it would be great to get this basic feature : dates on results…

Plenty other missions are enlisted with the #FixTheWorld hash-tag issued by the official @MetaPress mastodon account.

For instance, Mediapart still can’t provide exact results (nor providing them as an RSS flux).

You turn to play !