2018-05-10 10:42 AM
I would like to add a custom feed that focuses on URL as an indicator. Using domain and mapping them to alias.host does not seem to be overly beneficial as it creates a lot noise and triggers on potentially a non-indicator. As I do not see URL as being indexed I can only think that this would be a problem and would not work as expected. Does anyone have a clever way of pulling in a feed focusing on a list of specific URLs rather then purely IPs and domains?
'http://www.example1234.com/content-access/article.html'
or
'http://www.example1234.com/content-access/eng/access-in/index.html'
vs.
'www.example1234.com'
2018-05-10 11:03 AM
Jay,
You can create feeds based off of non-indexed meta keys. Essentially, you can create a feed from almost ANY meta key in the solution. You would just use that meta key as the meta callback.
If you currently do not have URL meta, obviously you would need that first. I know some proxy logs will supply the entire URL. A while back I wrote a parser to concatenate meta values together to build the URL. It was posted to this community site.
Please note that any TEXT formatted meta key is limited to 256 bytes of data. Therefore, if the url becomes too big, it may not match. Feeds, as you may know, are exact matches.
There may be some less efficient ways of doing the match with a parser, but I haven't explored it.
Chris
2018-05-10 11:32 AM
There is an option to HTTP_lua to register url. There are some caveats:
2018-05-17 02:53 PM
In your second bullet you reference registerComponents. I don't see that function in the http_lua_options.lua. Does this mean I need to write a function registerComponents() in http_lua_options.lua that returns true?
/Dion
2018-05-17 03:31 PM
It's there, or at least should be, as the first listed option. Perhaps you have an old version of the options file?
Here's the latest from Live:
module("HTTP_lua_options")
-- 2018.05.02.1
function registerComponents()
--[=[
"Register Path Components" : default TRUE (or FALSE if "Register URL" enabled)IMPORTANT: It is strongly advised to not enable both this option and
"Register URL".Register directory, filename, extension, query (from the request path) and
host (from the HOST: header) as discrete meta.For example,
alias.host: www.example.com
directory: /someDir/
filename: somefile.html
extension: html
query: ?foo=barIf the "Register URL" option below is enabled, then this option defaults
to FALSE. To register both discrete meta from the path components and
a reconstructed url, both options must be explicitly enabled. It is
strongly advised not to enable both options.
--]=]
--return true
endfunction registerUrl()
--[=[
"Register URL" : default FALSEIMPORTANT: It is strongly advised to not enable both this option and
"Register Path Components".Default behavior is to register directory, filename, extension, query
(from the request path) and host (from the HOST: header) as discrete meta.This option will instead register them as a single meta value (individual keys
will not be registered due to redundancy),url: www.example.com/someDir/someFile.html?foo=bar
Note that the registered URL is a reconstructed approximation - it may not be the
exact URL that was 'clicked on'.
--]=]
return false
endfunction splitQuery()
--[=[
"Split Query String" : default falseDefault behavior is for the entire querystring from a request to be registered as an
single meta value:query: alpha=one&beta=two&gamma=three
If this option is enabled, then each element of a querystring will be registered as
individual meta values:query: alpha=one
query: beta=two
query: gamma=three
--]=]
return false
endfunction useOrigIP()
--[=[
"Use orig_ip" : default TRUEDefault behavior is to register values from x-forwarded-for headers and the like
with index key "orig_ip".If this option is disabled, then values will be registered as following:
hostnames "alias.host"
IPv4 "alias.ip"
IPv6 "alias.ipv6"
email address "email"
other "alias.host"
--]=]
return true
endfunction refererPath()
--[=[
"Referer Path" : default FALSEDefault behavior is to register the value of a "Referer:" header as "referer" meta.
If this option is enabled, then the host, directory, filename, extension, and querystring
values will be broken out from Referer and registered individually. In order to
avoid duplication, the entire Referer value will not be registered.For example, given the header:
Referer: http://www.example.com/hello/world.html?foo=bar&one=two
If this option is disabled (default), then the following meta will be registered:
referer: http://www.example.com/hello/world.html?foo=bar&one=two
If enabled, then the following meta will be registered:
alias.host: www.example.com
directory: /hello/
filename: world.html
extension: html
query: foo=bar&one=twoNote that if the "Split Query String" option is also enabled then the querystring
will instead be registered individually (see above).
--]=]
return false
endfunction userAgent()
--[=[
"User-Agent Key" : default "client"Default behavior is to register the value of User-Agent headers with the 'client'
index key.Modifying this value will cause User-Agent values to additionally be registered
with the specified key. If the key does not already exist it will be created - normal
key name restrictions apply.Note that this will result in duplication of meta. User-agent will be registered
to both "client" and the specified key.
--]=]
return "client"
endfunction respReason()
--[=[
"Response Code Reason" : default TRUEFor reponse codes other than 2xx, default behavior is to register both the status
code and reason phrase together as error meta. For example,error: 404 Not Found
Disasbling this option (setting to false) will cause only the response code to be
registered. For example,error: 404
--]=]
return true
endfunction decompress()
--[=[
"Decompress" : default 0Decompress content-encoded HTTP responses. Encodings gzip,
deflate, and chunked are supported. Enabling this provides
visibility into such responses to other parsers.Decompression incurs a performance penalty which will vary
depending upon the prevalence of compressed or encoded HTTP
responses seen in the environment. This can be ameliorated
to some extent by choosing to only decompress specific content
types.This is a bit-packed value representing the content types to
decompress, where:1 application/*
2 audio/*
4 font/*
8 image/*
16 message/*
32 model/*
64 text/*
128 video/*The default value of 0 means that decompression will not be
performed for any content type, which maximizes performance.A value of 65 specifies that content-types "application" and
"text" will be decompressed. This should provide a good
balance of visibility and performance.To maximize visibility, a value of 255 will enable decompression
of all content types.Enabling a content-type enables all constituent sub-types. For
example, "application" includes "application/octet-stream",
"application/javascript", etc.NOTES:
Only valid for versions 11.0+. This option has no effect on
versions 10.x or older as they do not have the capability to
decompress encoded HTTP responses.Has no effect on instances of compression which are not HTTP
responses, such as compressed archive files (zip, rar, et al),
LZMA streams, etc.
--]=]
return 0
endfunction advanced()
--[=[
"Advanced Analysis" : default FALSEPerform advanced analysis of HTTP characteristics. Analysis includes only the first
request and first response. Meta is registered to the key "analysis.service".
--]=]
return false
endfunction customHeaders()
--[=[
"Custom Headers" : default NONE
Beware of excessive duplication, which will impact performance and retention. Meta
registered will be in addition to, not replacement of, standard meta registration.
In other words, if you specify "user-agent" headers be registered to key "foo", it
will still also be registered to alias.host (or alias.ip / alias.ipv6 if appropriate).
Syntax is,
["header"] = "key",
Where,
"header" is the desired HTTP header in lowercase. Do not included spaces, colons, etc.
"key" is the desired meta key with which to register the value of that header
Key names must be 16 characters or less, and consist only of alphanumeric, dots, and
hyphens. Keys specified that do not meet these requirements will be modified in order
to conform.
Keys listed here are registered as format="Text". Don't use keys indexed in other formats.
--]=]
return {
--["origin"] = "referer",
}
end
2018-05-17 04:07 PM
Yep. I had an older version of http_lua_options.lua. Thanks for the reply.
/Dion
2018-05-24 01:18 PM
I must be doing something wrong.
Any help would be greatly appreciated!
2018-05-24 02:03 PM
Chris Ahearns parser writes to the key :"url", not "url.custom" (unless you changed it) . If you explicitly set the http_lua_options file to "true" for both "register components" and "register url", again it will be written to the "url" meta key. There is no need to add the entry to the index-decoder-custom.xml, as ANY newly registered keys are automatically created, only if you need to explicitly change the "format" to something other than "Text" for a custom key, would you need to add it. check and see if you are registering "url" meta already.