.ASM files ignored? #6943
Replies: 2 comments 5 replies
-
|
You beat me to it. It's quite common for files that have come from Windows systems to be identified as binaries, but it's usually because of a byte order mark (BOM), which IIRC things like Visual Studio and other older dev tools add to files. This is a new one for me.
It's not up to Linguist. We use charlock_holmes to detect binaries: linguist/lib/linguist/blob_helper.rb Lines 122 to 142 in 39fd5e9 irb(main):001> require 'charlock_holmes'
=> true
irb(main):002> contents = File.read('ADVGRP.ASM')
=> "; [ This translation created 10-Feb-83 by Version 4.3 ]\r\n\r\n\t.RADIX 8\t\t; To be safe\r\n\r\nCSEG\tSEGMENT PUBLIC 'CODESG' \r\n\tASSUME CS:CSEG\r\n\r\nINCLUDE\tOEM.H\r\n\r\n\tTITLE ADVGRP - ADVANCE...
irb(main):003> CharlockHolmes::EncodingDetector.detect(contents)
=> {:type=>:binary, :confidence=>100}
irb(main):004> |
Beta Was this translation helpful? Give feedback.
-
Yes
Possibly. We'd have to be mindful of the risk of stripping legit uses (I have no idea if there are any). |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I was looking at https://github.com/microsoft/GW-BASIC and it contains a lot of Assembly code (.ASM files).
However, when running github-linguist, it seems that those files are ignored. Am I missing something?
github-linguist output:
output for specific file:
Why is it detected as binary in the GW-BASIC repo and detected correctly in https://github.com/microsoft/MS-DOS?
EDIT: Turns out after opening the file in VS Code that there is a few NUL bytes at the end of the file.
Should Linguist ignore NULs at the end of a file?
Beta Was this translation helpful? Give feedback.
All reactions