A month and a half after the last release, the version 1.7.8 of Meta-Press.es is now online. This new version is bringing ergonomic enhancements and a major round of fix for the known sources.
The enhancements are including some long awaited requests :
-
a slice date filter, with two inputs, to work on local results
-
a search input to easily find a particular source in the source box of a finished search, when there are more than 30 sources listed here
-
some "select all" / "select none" and "toggle selection" buttons when selecting results to export. Those buttons only affect the results visible in the current page (and it’s still possible to choose how many elements are listed on a page)
-
the list of the sources we’re waiting for, when a search is taking a noticeable time (which can be expanded from the search status line when there are less than 30 awaited sources)
-
a Cancel button that actually stops the running search where it is and let you work on the results (the previous solution was just refreshing the page, loosing the results, this is done via the recent JavaScript promise aborting API, thanks to a mention from @lutindiscret)
-
subsequently, a new setting appeared : a request timeout ; which automatically finish a search after 90s (but can be set to 0 to wait "forever")
-
-
a new source statistic line which displays the number of selected sources and the number of needed permissions to perform the next search, along with a button to give those permissions
In addition, every regular expressions of the 314 sources (which represent already 10k lines of formated JSON) have been screen for ReDOS vulnerabilities using RegexStaticAnalysis.
25 regex were flagged with exponential degree of ambiguity (EDA) or infinite
degree of ambiguity (IDA) over 180 regex analysed. Each time it was related to
unclear boundaries, multiple infinite quantifiers *
or +
, or an OR
construct (a|a)*
with an infinite quantifier.
Surprisingly it have been possible for each case to improve the RegExp and have it passing the test and running faster (being more tightly bound to the subject to capture). For example, this simple and easy to read regular expression :
-
(\d+) (.) (\d)
[1] ;
Captures a date (for instance : '23 july 2021') and was replaced by :
-
^(\d{1,2}) ([^ ]{3,9}) (\d{4})$
[2] ;
Which captures the same date but with boundaries around the portion of string
(^
at the beginning and $
at the end) and sharper descriptions of each field
to capture (sharp number of digits), month name that can contain french
accented letters (like décembre) but no spaces… Real life examples are
usually a bit more complex but the main idea is here.
Again, like with the Accessibility audit, this work generally resulted in improvements in the parsing of the concerned source so a general improvement for Meta-Press.es.